Tagged articles
60 articles
Page 1 of 1
DataFunTalk
DataFunTalk
May 18, 2026 · Artificial Intelligence

Google Gemini 3.2 Flash Leaks: Generates 2200 Lines of Code in One Prompt, Outpacing Claude and GPT

Google’s Gemini 3.2 Flash model quietly appeared before the I/O event, letting a single prompt produce over 2,200 lines of sophisticated code—including interactive 3D scenes and a functional Windows 98—while claiming near‑GPT‑5.5 performance with dramatically lower inference cost and new integrations for Canva, Instacart and OpenTable.

AI integrationCode GenerationFlash model
0 likes · 8 min read
Google Gemini 3.2 Flash Leaks: Generates 2200 Lines of Code in One Prompt, Outpacing Claude and GPT
PaperAgent
PaperAgent
May 9, 2026 · Artificial Intelligence

How ActDistill Slashes Deployment Costs of VLA Large Models

ActDistill, proposed by Tongji University and collaborators, reduces the inference latency, compute consumption, and action-loop speed of Vision‑Language‑Action (VLA) models by selectively distilling action‑relevant knowledge, achieving up to 1.67× speedup while preserving control quality on real robot hardware.

ActDistillRoboticsVLA
0 likes · 13 min read
How ActDistill Slashes Deployment Costs of VLA Large Models
Old Zhang's AI Learning
Old Zhang's AI Learning
Apr 26, 2026 · Artificial Intelligence

Distilling Claude Opus into Qwen3.6-27B – GGUF Lets You Run Locally on Consumer GPUs

The preview model Qwopus3.6-27B‑v1, distilled from Claude Opus onto Qwen3.6‑27B using SFT with the Unsloth stack and a curated 12 K high‑quality inference sample set, is evaluated on agentic reasoning, front‑end design, and Canvas/WebGL tasks with an RTX 5090, and can be deployed locally via llama.cpp GGUF quantizations with detailed memory guidelines.

Apache 2.0Claude OpusGGUF
0 likes · 7 min read
Distilling Claude Opus into Qwen3.6-27B – GGUF Lets You Run Locally on Consumer GPUs
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Apr 10, 2026 · Artificial Intelligence

How to Supercharge Small LLM Agents with ReAct Data Construction and EasyDistill

This guide explains how to build high‑quality agent training data using ReAct trajectories, synthesize difficult samples with a data‑flywheel, and distill the knowledge into small LLMs on Alibaba Cloud PAI, covering teacher model deployment, EasyDistill installation, data generation, task solving, rubric filtering, and final model deployment.

AgentData GenerationEasyDistill
0 likes · 14 min read
How to Supercharge Small LLM Agents with ReAct Data Construction and EasyDistill
Model Perspective
Model Perspective
Apr 8, 2026 · Artificial Intelligence

Distilling Your Own Thinking from AI Chat Logs

The article explores how AI model "distillation" can turn personal chat histories into a digital twin that reveals explicit knowledge, thinking patterns, and cognitive blind spots, while outlining practical steps to extract skill lists, mental models, and boundaries from one’s own AI conversations.

AIRAGknowledge extraction
0 likes · 11 min read
Distilling Your Own Thinking from AI Chat Logs
Coder Circle
Coder Circle
Apr 7, 2026 · Industry Insights

AI Industry Highlights: OpenAI Shake‑up, China’s Model Surge, Gemma 4 Open‑Source, and Cursor 3

The April 7 AI briefing covers OpenAI’s leadership turnover and bold economic reform proposals, China’s AI model usage overtaking the United States, Google’s Gemma 4 achieving 85% of larger models’ scores with a 256K context, Cursor 3 ushering in an agent‑based coding era, and a joint effort by OpenAI, Anthropic and Google to combat model distillation.

AI policyChina AI modelsCursor 3
0 likes · 9 min read
AI Industry Highlights: OpenAI Shake‑up, China’s Model Surge, Gemma 4 Open‑Source, and Cursor 3
Old Zhang's AI Learning
Old Zhang's AI Learning
Mar 12, 2026 · Artificial Intelligence

Distilling Claude Opus 4.6 into Qwen3.5‑27B: High‑Quality Reasoning on a Single RTX 3090

The article details how Claude Opus 4.6's chain‑of‑thought data were used to distill the 27‑billion‑parameter Qwen3.5‑27B model with Unsloth and LoRA, achieving full‑context inference on a single RTX 3090/4090, while outlining performance numbers, hyper‑parameter tips, benchmark gains and the trade‑offs of losing multimodal abilities.

Claude Opus 4.6GPU inferenceLoRA
0 likes · 7 min read
Distilling Claude Opus 4.6 into Qwen3.5‑27B: High‑Quality Reasoning on a Single RTX 3090
Black & White Path
Black & White Path
Feb 28, 2026 · Industry Insights

Anthropic Accuses Chinese AI Labs of ‘Distillation Attacks’ – Musk Mockingly Highlights Double Standards

Anthropic alleges that DeepSeek, Moonshadow and MiniMax used about 24,000 fake accounts to conduct over 16 million API interactions with Claude, prompting Elon Musk to mock the company's double standards while sparking a broader debate over model‑distillation legality, API‑use contracts, and the shifting competitive dynamics of the global AI industry.

AI competitionAPI abuseAnthropic
0 likes · 23 min read
Anthropic Accuses Chinese AI Labs of ‘Distillation Attacks’ – Musk Mockingly Highlights Double Standards
Top Architect
Top Architect
Feb 14, 2026 · Artificial Intelligence

Why Test‑Time Compute Is the Next Breakthrough for Large Language Models

The article explains how inference‑oriented large language models shift the focus from training‑time resources to test‑time computation, detailing scaling laws, verification techniques, reinforcement‑learning pipelines such as DeepSeek‑R1, and methods for distilling reasoning abilities into smaller, consumer‑grade models.

Prompt engineeringinference computelarge language models
0 likes · 19 min read
Why Test‑Time Compute Is the Next Breakthrough for Large Language Models
Old Zhang's AI Learning
Old Zhang's AI Learning
Feb 5, 2026 · Artificial Intelligence

Distilling GLM‑4.7‑Flash with Claude‑Opus‑4.5 for Easy Consumer‑GPU Deployment

The article explains how TeichAI used Claude‑Opus‑4.5 to generate a high‑quality 250‑sample reasoning dataset and distill the GLM‑4.7‑Flash model into a compact GGUF version that runs on a single consumer‑grade GPU via llama.cpp, detailing the workflow, quantization options, and practical considerations.

AI datasetsGGUFUnsloth
0 likes · 6 min read
Distilling GLM‑4.7‑Flash with Claude‑Opus‑4.5 for Easy Consumer‑GPU Deployment
360 Smart Cloud
360 Smart Cloud
Dec 3, 2025 · Artificial Intelligence

How Model Distillation Enhances LLM Performance on the TLM Platform

This article explains the TLM large‑model development platform and details how knowledge distillation—using soft labels, temperature scaling, and combined loss functions—compresses teacher models into efficient student models, with practical steps and evaluation on the platform.

AILLMTLM platform
0 likes · 5 min read
How Model Distillation Enhances LLM Performance on the TLM Platform
DeWu Technology
DeWu Technology
Nov 3, 2025 · Artificial Intelligence

How Large Language Models Boost Search Relevance: A Real‑World Case Study

This article explains how a leading e‑commerce platform leveraged large language models to overcome traditional search relevance challenges, detailing the iterative workflow, model distillation, performance gains, deployment results, and future directions for smarter, more accurate product search.

AIe‑commercelarge language models
0 likes · 10 min read
How Large Language Models Boost Search Relevance: A Real‑World Case Study
DataFunTalk
DataFunTalk
Sep 29, 2025 · Artificial Intelligence

How Glint-MVT Powers City‑Scale Multimodal AI: Insights from a Tech VP

In an interview before the DACon conference, Dr. Feng Ziyong reveals how Glint‑MVT and novel data‑synthesis techniques overcome distribution gaps, improve compositional understanding, and enable billion‑scale, second‑level retrieval for city‑level surveillance, while balancing model efficiency and effectiveness.

Embedding RetrievalMultimodal AIcity surveillance
0 likes · 11 min read
How Glint-MVT Powers City‑Scale Multimodal AI: Insights from a Tech VP
Data STUDIO
Data STUDIO
Sep 8, 2025 · Industry Insights

Claude Completely Banned for Chinese Companies – No Workarounds Anywhere

Anthropic announced an immediate, worldwide ban on Claude for any entity controlled by Chinese capital, citing legal, regulatory and security risks, and warned that continued access could enable military use or model‑stealing, urging firms to adopt domestic alternatives.

AI SafetyAI policyAnthropic
0 likes · 3 min read
Claude Completely Banned for Chinese Companies – No Workarounds Anywhere
AI Algorithm Path
AI Algorithm Path
Aug 16, 2025 · Artificial Intelligence

Meta Unveils DINOv3: A Universal Self‑Supervised Visual AI for All Image Tasks

Meta's DINOv3 is a 70‑billion‑parameter self‑supervised visual foundation model trained on 17 billion Instagram images without any labels, introducing dense feature extraction, Gram‑Anchoring to prevent feature collapse, high‑resolution adaptation, and multi‑student distillation that together enable out‑of‑the‑box performance on segmentation, depth estimation, 3D matching, and tracking while surpassing prior models such as DINOv2, CLIP, and SAM.

Computer VisionDINOv3Gram Anchoring
0 likes · 8 min read
Meta Unveils DINOv3: A Universal Self‑Supervised Visual AI for All Image Tasks
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jul 23, 2025 · Artificial Intelligence

How to Distill Large Language Models for Efficient Text Generation with EasyDistill

This guide explains how to use the EasyDistill framework and Alibaba Cloud PAI to distill large language models for high‑quality text generation, covering model deployment, SFT and DPO training data construction, code examples, configuration files, and best practices for achieving resource‑efficient, high‑performance student models.

DPOEasyDistillPAI
0 likes · 14 min read
How to Distill Large Language Models for Efficient Text Generation with EasyDistill
AsiaInfo Technology: New Tech Exploration
AsiaInfo Technology: New Tech Exploration
Jun 23, 2025 · Artificial Intelligence

How Generative Data‑Driven Model Distillation Boosts Large‑Model Performance and Cuts Compute

This article examines generative data‑driven model distillation as a technique that not only compresses large language models but also improves their accuracy, addresses data‑privacy constraints, and reduces computational costs, offering a practical roadmap and real‑world results from a corporate AI platform.

AI OptimizationKnowledge TransferMaaS platform
0 likes · 22 min read
How Generative Data‑Driven Model Distillation Boosts Large‑Model Performance and Cuts Compute
JD Retail Technology
JD Retail Technology
Jun 18, 2025 · Artificial Intelligence

How JD’s Tech Teams Power 618: AI, Logistics, and Voice Innovations

The article explores how JD’s engineers across retail, logistics, and AI divisions use model distillation, data selection, intelligent routing, and advanced voice recognition to improve the 618 shopping festival experience, highlighting real‑world technical challenges, solutions, and the company’s talent development programs.

AILogisticsdata engineering
0 likes · 16 min read
How JD’s Tech Teams Power 618: AI, Logistics, and Voice Innovations
JD Tech Talk
JD Tech Talk
May 22, 2025 · Artificial Intelligence

From Academic Research to Industrial Anti‑Fraud: Leveraging LLMs, Reinforcement Learning, and Model Distillation for Advertising Risk Detection

The article recounts Xiaoting’s journey from a PhD research background to leading JD.com’s ad‑fraud detection, detailing how large language models, reinforcement learning, and model distillation were applied to identify hidden address codes, reduce false‑positive rates to 0.3%, and balance accuracy with real‑time performance in a high‑traffic e‑commerce environment.

AIAd FraudAdvertising
0 likes · 11 min read
From Academic Research to Industrial Anti‑Fraud: Leveraging LLMs, Reinforcement Learning, and Model Distillation for Advertising Risk Detection
JD Retail Technology
JD Retail Technology
May 22, 2025 · Industry Insights

Cracking Hidden Ad Fraud: JD’s AI‑Driven Anti‑Cheat System Explained

This article recounts the journey of a JD PhD trainee who transformed academic research on anomaly detection into a production‑grade, LLM‑enhanced anti‑fraud system that identifies concealed address codes in CPS ads, detailing model design, LoRA fine‑tuning, reinforcement learning, distillation, cost‑aware deployment, and lessons learned for scalable ad risk management.

ad fraud detectionindustry AIlarge language model
0 likes · 12 min read
Cracking Hidden Ad Fraud: JD’s AI‑Driven Anti‑Cheat System Explained
Data Thinking Notes
Data Thinking Notes
May 19, 2025 · Artificial Intelligence

How Model Distillation Shrinks Giant AI Models Without Losing Performance

This article explains model distillation—a technique that transfers knowledge from large teacher models to compact student models—covering its motivation, core principles, key steps, practical applications, and both its advantages and limitations, illustrating how AI can be made efficient without sacrificing performance.

AI compressionKnowledge Transfermodel distillation
0 likes · 10 min read
How Model Distillation Shrinks Giant AI Models Without Losing Performance
JD Retail Technology
JD Retail Technology
May 19, 2025 · Artificial Intelligence

How JD’s Omniforce Boosts Large Model Efficiency with Cloud‑Edge Collaboration

The JD Exploration Institute paper introduces Omniforce, a human‑centered, cloud‑edge collaborative AutoML system that uses model distillation, dynamic data governance, Bayesian‑optimized training, and edge deployment to cut large‑model training costs by 70% and improve inference speed by 30%, powering the JoyBuild platform for broader AI adoption.

AI efficiencyAutoMLJoyBuild
0 likes · 6 min read
How JD’s Omniforce Boosts Large Model Efficiency with Cloud‑Edge Collaboration
JD Tech
JD Tech
May 15, 2025 · Artificial Intelligence

How JD’s Omniforce Cuts Large‑Model Training Cost by 70% and Boosts Inference Speed 30%

The paper "Omniforce" from JD Exploration Research Institute presents a cloud‑edge collaborative AutoML system that uses model distillation, data governance, Bayesian training optimization, and cloud‑edge cooperation to reduce large‑model training costs by 70% and improve inference efficiency by an average of 30%, offering a reusable technical paradigm for scalable AI deployment.

AI efficiencyJoyBuildLarge Model
0 likes · 6 min read
How JD’s Omniforce Cuts Large‑Model Training Cost by 70% and Boosts Inference Speed 30%
Architect's Guide
Architect's Guide
May 13, 2025 · Artificial Intelligence

DeepSeek Model Distillation Technology: Overview, Innovations, Architecture, Training, Performance, and Challenges

This article provides a comprehensive overview of DeepSeek's model distillation technology, detailing its definition, key innovations, architecture, training methods, performance gains, and the remaining challenges such as the implicit performance ceiling and multimodal data distillation.

AI OptimizationDeepSeekKnowledge Transfer
0 likes · 14 min read
DeepSeek Model Distillation Technology: Overview, Innovations, Architecture, Training, Performance, and Challenges
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Apr 22, 2025 · Artificial Intelligence

How DistilQwen2.5-DS3-0324 Achieves Fast, Accurate Reasoning via Quick‑Think Distillation

This article introduces DistilQwen2.5-DS3-0324, a distilled language model series that balances rapid inference with strong reasoning by applying a fast‑thinking chain‑of‑thought strategy, details its two‑stage distillation framework, evaluation on diverse benchmarks, and provides code for downloading and using the models.

Deep Learningchain-of-thoughtfast inference
0 likes · 17 min read
How DistilQwen2.5-DS3-0324 Achieves Fast, Accurate Reasoning via Quick‑Think Distillation
Baidu Geek Talk
Baidu Geek Talk
Apr 9, 2025 · Artificial Intelligence

Baidu's Wenxin X1 Large Model Officially Launches on Qianfan Platform

On April 2, Baidu released its Wenxin X1 large model on the Qianfan platform, offering enterprise users and developers a multimodal, deep‑thinking AI with superior math, coding, and reasoning scores, low token‑price API access, batch inference, one‑click distillation, and rapid RAG/Agent application building.

AIAPI ServiceBaidu
0 likes · 4 min read
Baidu's Wenxin X1 Large Model Officially Launches on Qianfan Platform
Architect
Architect
Mar 3, 2025 · Artificial Intelligence

Unlocking Reasoning LLMs: Methods, DeepSeek R1 Insights, and Cost‑Effective Strategies

This article examines how to build and improve reasoning‑capable large language models, explains the definition and use‑cases of reasoning models, details DeepSeek‑R1’s training pipeline, compares four key enhancement methods—including inference‑time scaling, pure RL, SFT + RL, and distillation—and offers budget‑friendly advice.

AI researchDeepSeekInference Scaling
0 likes · 27 min read
Unlocking Reasoning LLMs: Methods, DeepSeek R1 Insights, and Cost‑Effective Strategies
Tencent Technical Engineering
Tencent Technical Engineering
Feb 21, 2025 · Artificial Intelligence

DeepSeek-R1: Enhancing Reasoning Capabilities in LLMs via Reinforcement Learning

DeepSeek‑R1 demonstrates that large‑scale reinforcement learning, especially with the novel Group Relative Policy Optimization and a rule‑based reward scheme, can markedly boost reasoning in LLMs without heavy supervised fine‑tuning, while a brief cold‑start SFT phase, two‑stage alignment, and knowledge distillation further improve performance and efficiency, despite remaining challenges such as language mixing.

DeepSeek-R1GRPOLLM Reasoning
0 likes · 21 min read
DeepSeek-R1: Enhancing Reasoning Capabilities in LLMs via Reinforcement Learning
Architects' Tech Alliance
Architects' Tech Alliance
Feb 18, 2025 · Artificial Intelligence

How DeepSeek’s Latest Models Redefine AI Performance and Industry Adoption

The DeepSeek report details rapid model releases from 2024 onward, highlighting innovations such as model distillation, a 671 B MoE architecture, FP8 mixed‑precision, and the Janus‑Pro multimodal framework, while also documenting major cloud and chip providers' integration of these models into their services.

AI industry adoptionDeepSeekMoE architecture
0 likes · 10 min read
How DeepSeek’s Latest Models Redefine AI Performance and Industry Adoption
Fun with Large Models
Fun with Large Models
Feb 16, 2025 · Artificial Intelligence

Can You Claim to Know Large Models? Guide to Distillation, Quantization & Fine‑Tuning

This article explains why the massive DeepSeek V3/R1 model (671 B parameters) is hard to deploy and introduces three key techniques—model distillation, quantization, and fine‑tuning—that can shrink, accelerate, or specialize large models, while outlining their trade‑offs and practical steps.

AI model compressionDeepSeeklarge language models
0 likes · 10 min read
Can You Claim to Know Large Models? Guide to Distillation, Quantization & Fine‑Tuning
DataFunTalk
DataFunTalk
Feb 16, 2025 · Artificial Intelligence

Understanding Reasoning LLMs: DeepSeek R1 Variants, Inference‑Time Scaling, and Training Strategies

This article explains what reasoning language models are, outlines their strengths and weaknesses, details DeepSeek R1's three variants and their training pipelines—including pure reinforcement learning, SFT + RL, and distillation—while also discussing inference‑time scaling techniques and related research such as Sky‑T1 and TinyZero.

DeepSeekInference Scalingmodel distillation
0 likes · 16 min read
Understanding Reasoning LLMs: DeepSeek R1 Variants, Inference‑Time Scaling, and Training Strategies
Architects' Tech Alliance
Architects' Tech Alliance
Feb 16, 2025 · Artificial Intelligence

How DeepSeek’s Distillation Breaks Bottlenecks and Boosts Multimodal AI Performance

This article provides an in‑depth technical analysis of DeepSeek’s model distillation technology, covering its core principles, innovative data‑model fusion strategies, architecture design, training optimizations, performance benchmarks, and the remaining challenges of scaling distillation to multimodal tasks.

AI OptimizationDeepSeeklarge language models
0 likes · 16 min read
How DeepSeek’s Distillation Breaks Bottlenecks and Boosts Multimodal AI Performance
Top Architect
Top Architect
Feb 14, 2025 · Artificial Intelligence

DeepSeek Model Distillation: Principles, Innovations, Architecture, and Performance

This article provides an in‑depth overview of DeepSeek’s model distillation technology, covering its definition, core principles, innovative data‑model distillation integration, architecture design, training strategies, performance gains, and the challenges of scaling to multimodal data.

AI OptimizationDeepSeekKnowledge Transfer
0 likes · 16 min read
DeepSeek Model Distillation: Principles, Innovations, Architecture, and Performance
IT Architects Alliance
IT Architects Alliance
Feb 10, 2025 · Artificial Intelligence

DeepSeek Distillation Technology: Principles, Innovations, Performance, and Future Outlook

The article explains DeepSeek's model distillation technique, covering its fundamental knowledge‑transfer principles, unique innovations such as data‑model fusion and task‑specific strategies, impressive benchmark results, practical applications in edge and online inference, existing challenges, and future research directions.

AI OptimizationDeep LearningEdge Computing
0 likes · 15 min read
DeepSeek Distillation Technology: Principles, Innovations, Performance, and Future Outlook
Architect
Architect
Feb 9, 2025 · Artificial Intelligence

How DeepSeek’s Model Distillation Boosts AI Efficiency and Performance

This article provides an in‑depth analysis of DeepSeek’s model distillation technology, covering its definition, core principles, innovative strategies, architecture design, training optimizations, benchmark results, efficiency gains, and the remaining challenges of applying distillation to large language models and multimodal data.

AI efficiencyDeepSeekKnowledge Transfer
0 likes · 16 min read
How DeepSeek’s Model Distillation Boosts AI Efficiency and Performance
Architecture Digest
Architecture Digest
Feb 7, 2025 · Artificial Intelligence

Open-Source Replication of OpenAI’s o1 Model Achieves Superior Performance with Minimal Cost

A recent study by Fei‑Fei Li’s team shows that using supervised fine‑tuning on the open‑source Qwen2.5‑32B‑Instruct model can replicate and even surpass the reasoning abilities of OpenAI’s o1‑preview at a fraction of the computational cost, demonstrating a cheap yet powerful approach to large‑language‑model development.

Supervised Fine‑Tuningbudget-forcingcost-effective-ai
0 likes · 6 min read
Open-Source Replication of OpenAI’s o1 Model Achieves Superior Performance with Minimal Cost
Architect
Architect
Feb 3, 2025 · Artificial Intelligence

How DeepSeek‑R1 Uses Pure Reinforcement Learning to Match OpenAI’s o1

This article presents DeepSeek‑R1 and DeepSeek‑R1‑Zero, two next‑generation LLMs trained with pure reinforcement learning and multi‑stage fine‑tuning, details their GRPO training framework, model‑distillation pipeline, open‑source release, and evaluation results that rival OpenAI’s o1‑1217 across reasoning, knowledge, and coding benchmarks.

DeepSeekLLM evaluationOpenAI o1
0 likes · 10 min read
How DeepSeek‑R1 Uses Pure Reinforcement Learning to Match OpenAI’s o1
IT Services Circle
IT Services Circle
Feb 2, 2025 · Artificial Intelligence

OpenAI and Anthropic Accuse DeepSeek of Model Distillation and IP Infringement: Industry Reactions and Technical Overview

OpenAI and Anthropic allege that DeepSeek has illegally distilled their large language models, prompting investigations, industry satire, and a detailed look at model distillation technology, its legal implications, and the broader trends shaping AI cost, scaling laws, and market dynamics.

AI ethicsDeepSeekOpenAI
0 likes · 10 min read
OpenAI and Anthropic Accuse DeepSeek of Model Distillation and IP Infringement: Industry Reactions and Technical Overview
Baobao Algorithm Notes
Baobao Algorithm Notes
Jan 22, 2025 · Artificial Intelligence

Can RL‑Only Training Make LLMs Beat OpenAI‑o1? Inside DeepSeek‑R1’s Architecture and Results

DeepSeek‑R1’s open‑source series demonstrates that reinforcement‑learning‑only training can match top‑tier models like OpenAI‑o1, while a small amount of SFT further improves readability; the article dissects its technical report, training pipeline, reward design, distillation strategy, benchmark outcomes, and remaining challenges.

DeepSeekSupervised Fine‑Tuninglarge language model
0 likes · 11 min read
Can RL‑Only Training Make LLMs Beat OpenAI‑o1? Inside DeepSeek‑R1’s Architecture and Results
Baobao Algorithm Notes
Baobao Algorithm Notes
Sep 5, 2024 · Artificial Intelligence

Why Small LLMs Are the Secret Weapon for Scaling Large Model Research

The article explains how homologous small language models—trained on the same tokenizer and data as their large counterparts—serve as cheap, fast experimental platforms that can predict large‑model performance, guide pre‑training decisions, and support techniques like distillation and reward modeling.

AI researchLLM scalingQwen2
0 likes · 13 min read
Why Small LLMs Are the Secret Weapon for Scaling Large Model Research
Baobao Algorithm Notes
Baobao Algorithm Notes
Jul 25, 2024 · Artificial Intelligence

Why LLaMA 3 405B Matches GPT‑4o: Architecture, Training, and Industry Impact

The article provides an in‑depth analysis of LLaMA 3 405B, covering its dense Transformer architecture, three‑stage pre‑training (initial, long‑context, annealing), iterative post‑training with RM‑guided rejection sampling, the decision against MOE, and the broader implications for both large and small model development.

405BModel architecturemodel distillation
0 likes · 17 min read
Why LLaMA 3 405B Matches GPT‑4o: Architecture, Training, and Industry Impact
DataFunSummit
DataFunSummit
Jul 10, 2024 · Artificial Intelligence

Applying Large Language Models to Recommendation Systems at Ant Group

The article presents Ant Group's research on integrating large language models into recommendation pipelines, covering background challenges, knowledge extraction, teacher‑model distillation, efficient deployment, experimental results, and future directions to improve accuracy and reduce bias.

AILLMRecommendation Systems
0 likes · 13 min read
Applying Large Language Models to Recommendation Systems at Ant Group
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
May 1, 2024 · Artificial Intelligence

Hyper‑SD: Trajectory‑Segmented Consistency Model for Accelerating Diffusion Image Generation

Hyper‑SD introduces a trajectory‑segmented consistency distillation framework that combines trajectory‑preserving and trajectory‑reconstruction strategies, integrates human‑feedback learning and score distillation, and achieves state‑of‑the‑art low‑step image generation performance on both SD1.5 and SDXL models.

AI accelerationRLHFdiffusion models
0 likes · 10 min read
Hyper‑SD: Trajectory‑Segmented Consistency Model for Accelerating Diffusion Image Generation
Volcano Engine Developer Services
Volcano Engine Developer Services
Mar 7, 2024 · Artificial Intelligence

How SDXL‑Lightning Generates High‑Quality Images in Just 2 Steps

SDXL‑Lightning, a new diffusion‑based text‑to‑image model from ByteDance, uses Progressive Adversarial Distillation to cut inference steps to as few as 2 while maintaining high resolution and fidelity, offering ten‑fold speed gains, open‑source access, and compatibility with SDXL, ControlNet, and ComfyUI.

AI accelerationdiffusionmodel distillation
0 likes · 8 min read
How SDXL‑Lightning Generates High‑Quality Images in Just 2 Steps
Baidu Tech Salon
Baidu Tech Salon
Nov 10, 2023 · Artificial Intelligence

Baidu Search Deep Learning Model Architecture and Optimization Practices

Baidu's Search Architecture team details how its deep‑learning models have evolved to deliver direct answer results via semantic embeddings, describes a massive online inference pipeline that rewrites queries, ranks relevance, and classifies types, and outlines optimization techniques—including data I/O, CPU/GPU balancing, pruning, quantization, and distillation—to achieve high‑throughput, low‑latency search.

BaiduGPU OptimizationInference System
0 likes · 13 min read
Baidu Search Deep Learning Model Architecture and Optimization Practices
Baidu Geek Talk
Baidu Geek Talk
Nov 9, 2023 · Artificial Intelligence

Deep Learning Model Architecture Evolution in Baidu Search

The article chronicles Baidu Search’s Model Architecture Group’s evolution of deep‑learning‑driven search, detailing the shift from inverted‑index to semantic vector indexing, the use of transformer‑based models for text and image queries, large‑scale offline/online pipelines, and extensive GPU‑centric optimizations such as pruning, quantization and distillation, all aimed at delivering precise, cost‑effective results to hundreds of millions of users.

ErnieGPU inferenceModel Optimization
0 likes · 14 min read
Deep Learning Model Architecture Evolution in Baidu Search
DataFunTalk
DataFunTalk
Oct 2, 2023 · Artificial Intelligence

DAMO-YOLO: A High‑Efficiency, High‑Accuracy Object Detection Framework

DAMO‑YOLO is an open‑source, high‑speed and high‑precision object detection framework that leverages MAE‑NAS for low‑cost model customization, Efficient RepGFPN and HeavyNeck for enhanced multi‑scale detection, and a universal distillation technique to boost performance across model scales.

Efficient RepGFPNMAE-NASYOLO
0 likes · 15 min read
DAMO-YOLO: A High‑Efficiency, High‑Accuracy Object Detection Framework
DataFunTalk
DataFunTalk
Jan 18, 2023 · Artificial Intelligence

Search Relevance System Architecture and Practices in QQ Browser

This article presents the QQ Browser search relevance team's experience integrating QQ Browser and Sogou search systems, detailing business overview, relevance system evolution, algorithm architecture, evaluation metrics, deep semantic matching, relevance calibration, and model distillation techniques to improve search relevance performance.

Evaluation Metricsinformation retrievalmodel distillation
0 likes · 31 min read
Search Relevance System Architecture and Practices in QQ Browser
Ctrip Technology
Ctrip Technology
Nov 10, 2022 · Artificial Intelligence

Improving Search Intent Recognition and Term Weighting with Deep Learning and Model Distillation at Ctrip

This article describes how Ctrip's R&D team applied deep‑learning models, BERT‑based embeddings, knowledge distillation, and term‑weighting techniques to enhance e‑commerce search intent recognition and term importance estimation, achieving high accuracy while meeting sub‑10 ms latency requirements.

BERTDeep LearningSearch
0 likes · 12 min read
Improving Search Intent Recognition and Term Weighting with Deep Learning and Model Distillation at Ctrip
Tencent Advertising Technology
Tencent Advertising Technology
Aug 16, 2022 · Artificial Intelligence

CONFLUX: A Request-level Fusion Framework for Impression Allocation via Cascade Distillation

The paper presents CONFLUX, a request-level fusion ranking framework that uses linear programming and cascade distillation to allocate ad impressions between contract and real-time bidding ads, improving platform revenue and ad effectiveness while addressing offline training, latency, and model drift challenges.

CONFLUXKDD 2022Linear Programming
0 likes · 14 min read
CONFLUX: A Request-level Fusion Framework for Impression Allocation via Cascade Distillation
Code DAO
Code DAO
Apr 24, 2022 · Artificial Intelligence

How Transfer Learning Accelerates Deep Learning Across Vision, NLP, and Reinforcement Learning

The article explains how transfer learning reduces data and time requirements in deep learning by reusing pretrained models for vision, natural language processing, and reinforcement learning, while discussing challenges such as overfitting, the need for progressive networks, entropy regularization, domain adaptation, multi‑task learning, and model distillation.

Deep Learningdomain adaptationmodel distillation
0 likes · 10 min read
How Transfer Learning Accelerates Deep Learning Across Vision, NLP, and Reinforcement Learning
JD Cloud Developers
JD Cloud Developers
Feb 19, 2021 · Artificial Intelligence

How FastReID V1.0 Revolutionizes General Object Re‑Identification

FastReID, an open‑source PyTorch library from JD AI Research, offers a modular architecture, model distillation, automatic hyper‑parameter search, and multi‑task support, enabling efficient large‑scale object re‑identification across diverse applications such as security, retail, and smart infrastructure.

Re-identificationhyperparameter optimizationmodel distillation
0 likes · 12 min read
How FastReID V1.0 Revolutionizes General Object Re‑Identification
Meituan Technology Team
Meituan Technology Team
Jul 23, 2020 · Artificial Intelligence

Named Entity Recognition in O2O Search: Background, Technical Choices, and Practical Practices

Meituan’s O2O search relies on a hybrid NER system that combines high‑precision domain dictionaries with BERT‑based models scored by a CRF, built from multi‑source offline mining, accelerated via operator fusion, batching and mixed‑precision, and further enhanced by lattice‑LSTM, knowledge‑infused stages and weak‑supervision, delivering millisecond‑level latency and over‑90% recall.

Dictionary MatchingKnowledge-EnhancedNER
0 likes · 30 min read
Named Entity Recognition in O2O Search: Background, Technical Choices, and Practical Practices