Tagged articles

60 articles

Page 1 of 1

May 18, 2026 · Artificial Intelligence

Google Gemini 3.2 Flash Leaks: Generates 2200 Lines of Code in One Prompt, Outpacing Claude and GPT

Google’s Gemini 3.2 Flash model quietly appeared before the I/O event, letting a single prompt produce over 2,200 lines of sophisticated code—including interactive 3D scenes and a functional Windows 98—while claiming near‑GPT‑5.5 performance with dramatically lower inference cost and new integrations for Canva, Instacart and OpenTable.

AI integrationCode GenerationFlash model

0 likes · 8 min read

Google Gemini 3.2 Flash Leaks: Generates 2200 Lines of Code in One Prompt, Outpacing Claude and GPT

PaperAgent

May 9, 2026 · Artificial Intelligence

How ActDistill Slashes Deployment Costs of VLA Large Models

ActDistill, proposed by Tongji University and collaborators, reduces the inference latency, compute consumption, and action-loop speed of Vision‑Language‑Action (VLA) models by selectively distilling action‑relevant knowledge, achieving up to 1.67× speedup while preserving control quality on real robot hardware.

ActDistillRoboticsVLA

0 likes · 13 min read

How ActDistill Slashes Deployment Costs of VLA Large Models

Old Zhang's AI Learning

Apr 26, 2026 · Artificial Intelligence

Distilling Claude Opus into Qwen3.6-27B – GGUF Lets You Run Locally on Consumer GPUs

The preview model Qwopus3.6-27B‑v1, distilled from Claude Opus onto Qwen3.6‑27B using SFT with the Unsloth stack and a curated 12 K high‑quality inference sample set, is evaluated on agentic reasoning, front‑end design, and Canvas/WebGL tasks with an RTX 5090, and can be deployed locally via llama.cpp GGUF quantizations with detailed memory guidelines.

Apache 2.0Claude OpusGGUF

0 likes · 7 min read

Distilling Claude Opus into Qwen3.6-27B – GGUF Lets You Run Locally on Consumer GPUs

Alibaba Cloud Big Data AI Platform

Apr 10, 2026 · Artificial Intelligence

How to Supercharge Small LLM Agents with ReAct Data Construction and EasyDistill

This guide explains how to build high‑quality agent training data using ReAct trajectories, synthesize difficult samples with a data‑flywheel, and distill the knowledge into small LLMs on Alibaba Cloud PAI, covering teacher model deployment, EasyDistill installation, data generation, task solving, rubric filtering, and final model deployment.

AgentData GenerationEasyDistill

0 likes · 14 min read

How to Supercharge Small LLM Agents with ReAct Data Construction and EasyDistill

Model Perspective

Apr 8, 2026 · Artificial Intelligence

Distilling Your Own Thinking from AI Chat Logs

The article explores how AI model "distillation" can turn personal chat histories into a digital twin that reveals explicit knowledge, thinking patterns, and cognitive blind spots, while outlining practical steps to extract skill lists, mental models, and boundaries from one’s own AI conversations.

AIRAGknowledge extraction

0 likes · 11 min read

Distilling Your Own Thinking from AI Chat Logs

Coder Circle

Apr 7, 2026 · Industry Insights

AI Industry Highlights: OpenAI Shake‑up, China’s Model Surge, Gemma 4 Open‑Source, and Cursor 3

The April 7 AI briefing covers OpenAI’s leadership turnover and bold economic reform proposals, China’s AI model usage overtaking the United States, Google’s Gemma 4 achieving 85% of larger models’ scores with a 256K context, Cursor 3 ushering in an agent‑based coding era, and a joint effort by OpenAI, Anthropic and Google to combat model distillation.

AI policyChina AI modelsCursor 3

0 likes · 9 min read

AI Industry Highlights: OpenAI Shake‑up, China’s Model Surge, Gemma 4 Open‑Source, and Cursor 3

Old Zhang's AI Learning

Mar 12, 2026 · Artificial Intelligence

Distilling Claude Opus 4.6 into Qwen3.5‑27B: High‑Quality Reasoning on a Single RTX 3090

The article details how Claude Opus 4.6's chain‑of‑thought data were used to distill the 27‑billion‑parameter Qwen3.5‑27B model with Unsloth and LoRA, achieving full‑context inference on a single RTX 3090/4090, while outlining performance numbers, hyper‑parameter tips, benchmark gains and the trade‑offs of losing multimodal abilities.

Claude Opus 4.6GPU inferenceLoRA

0 likes · 7 min read

Distilling Claude Opus 4.6 into Qwen3.5‑27B: High‑Quality Reasoning on a Single RTX 3090

Black & White Path

Feb 28, 2026 · Industry Insights

Anthropic Accuses Chinese AI Labs of ‘Distillation Attacks’ – Musk Mockingly Highlights Double Standards

Anthropic alleges that DeepSeek, Moonshadow and MiniMax used about 24,000 fake accounts to conduct over 16 million API interactions with Claude, prompting Elon Musk to mock the company's double standards while sparking a broader debate over model‑distillation legality, API‑use contracts, and the shifting competitive dynamics of the global AI industry.

AI competitionAPI abuseAnthropic

0 likes · 23 min read

Anthropic Accuses Chinese AI Labs of ‘Distillation Attacks’ – Musk Mockingly Highlights Double Standards

Top Architect

Feb 14, 2026 · Artificial Intelligence

Why Test‑Time Compute Is the Next Breakthrough for Large Language Models

The article explains how inference‑oriented large language models shift the focus from training‑time resources to test‑time computation, detailing scaling laws, verification techniques, reinforcement‑learning pipelines such as DeepSeek‑R1, and methods for distilling reasoning abilities into smaller, consumer‑grade models.

Prompt engineeringinference computelarge language models

0 likes · 19 min read

Why Test‑Time Compute Is the Next Breakthrough for Large Language Models

Old Zhang's AI Learning

Feb 5, 2026 · Artificial Intelligence

Distilling GLM‑4.7‑Flash with Claude‑Opus‑4.5 for Easy Consumer‑GPU Deployment

The article explains how TeichAI used Claude‑Opus‑4.5 to generate a high‑quality 250‑sample reasoning dataset and distill the GLM‑4.7‑Flash model into a compact GGUF version that runs on a single consumer‑grade GPU via llama.cpp, detailing the workflow, quantization options, and practical considerations.

AI datasetsGGUFUnsloth

0 likes · 6 min read

Distilling GLM‑4.7‑Flash with Claude‑Opus‑4.5 for Easy Consumer‑GPU Deployment

360 Smart Cloud

Dec 3, 2025 · Artificial Intelligence

How Model Distillation Enhances LLM Performance on the TLM Platform

This article explains the TLM large‑model development platform and details how knowledge distillation—using soft labels, temperature scaling, and combined loss functions—compresses teacher models into efficient student models, with practical steps and evaluation on the platform.

AILLMTLM platform

0 likes · 5 min read

How Model Distillation Enhances LLM Performance on the TLM Platform

Baobao Algorithm Notes

Nov 18, 2025 · Artificial Intelligence

How LightReasoner Lets Small Models Teach Large Models to Reason Efficiently

The LightReasoner paper from Hong Kong University shows that small language models can guide large models on critical reasoning steps, achieving up to 90% faster inference and significant accuracy gains across multiple math benchmarks.

Contrastive DecodingKL divergenceMathematical Reasoning

0 likes · 9 min read

How LightReasoner Lets Small Models Teach Large Models to Reason Efficiently

DeWu Technology

Nov 3, 2025 · Artificial Intelligence

How Large Language Models Boost Search Relevance: A Real‑World Case Study

This article explains how a leading e‑commerce platform leveraged large language models to overcome traditional search relevance challenges, detailing the iterative workflow, model distillation, performance gains, deployment results, and future directions for smarter, more accurate product search.

AIe‑commercelarge language models

0 likes · 10 min read

How Large Language Models Boost Search Relevance: A Real‑World Case Study

DataFunTalk

Sep 29, 2025 · Artificial Intelligence

How Glint-MVT Powers City‑Scale Multimodal AI: Insights from a Tech VP

In an interview before the DACon conference, Dr. Feng Ziyong reveals how Glint‑MVT and novel data‑synthesis techniques overcome distribution gaps, improve compositional understanding, and enable billion‑scale, second‑level retrieval for city‑level surveillance, while balancing model efficiency and effectiveness.

Embedding RetrievalMultimodal AIcity surveillance

0 likes · 11 min read

How Glint-MVT Powers City‑Scale Multimodal AI: Insights from a Tech VP

Fun with Large Models

Sep 12, 2025 · Artificial Intelligence

When to Choose Model Fine‑Tuning vs RAG for Large‑Model Engineering Interviews

The article explains the technical background and suitable scenarios for Retrieval‑Augmented Generation (RAG) and model fine‑tuning, compares their strengths, discusses how they can be combined, and provides interview‑style Q&A on their capabilities, risks, and differences from model distillation.

AI InterviewFine‑TuningRAG

0 likes · 7 min read

When to Choose Model Fine‑Tuning vs RAG for Large‑Model Engineering Interviews

Data STUDIO

Sep 8, 2025 · Industry Insights

Claude Completely Banned for Chinese Companies – No Workarounds Anywhere

Anthropic announced an immediate, worldwide ban on Claude for any entity controlled by Chinese capital, citing legal, regulatory and security risks, and warned that continued access could enable military use or model‑stealing, urging firms to adopt domestic alternatives.

AI SafetyAI policyAnthropic

0 likes · 3 min read

Claude Completely Banned for Chinese Companies – No Workarounds Anywhere

AI Algorithm Path

Aug 16, 2025 · Artificial Intelligence

Meta Unveils DINOv3: A Universal Self‑Supervised Visual AI for All Image Tasks

Meta's DINOv3 is a 70‑billion‑parameter self‑supervised visual foundation model trained on 17 billion Instagram images without any labels, introducing dense feature extraction, Gram‑Anchoring to prevent feature collapse, high‑resolution adaptation, and multi‑student distillation that together enable out‑of‑the‑box performance on segmentation, depth estimation, 3D matching, and tracking while surpassing prior models such as DINOv2, CLIP, and SAM.

Computer VisionDINOv3Gram Anchoring

0 likes · 8 min read

Meta Unveils DINOv3: A Universal Self‑Supervised Visual AI for All Image Tasks

Alibaba Cloud Big Data AI Platform

Jul 23, 2025 · Artificial Intelligence

How to Distill Large Language Models for Efficient Text Generation with EasyDistill

This guide explains how to use the EasyDistill framework and Alibaba Cloud PAI to distill large language models for high‑quality text generation, covering model deployment, SFT and DPO training data construction, code examples, configuration files, and best practices for achieving resource‑efficient, high‑performance student models.

DPOEasyDistillPAI

0 likes · 14 min read

How to Distill Large Language Models for Efficient Text Generation with EasyDistill

AsiaInfo Technology: New Tech Exploration

Jun 23, 2025 · Artificial Intelligence

How Generative Data‑Driven Model Distillation Boosts Large‑Model Performance and Cuts Compute

This article examines generative data‑driven model distillation as a technique that not only compresses large language models but also improves their accuracy, addresses data‑privacy constraints, and reduces computational costs, offering a practical roadmap and real‑world results from a corporate AI platform.

AI OptimizationKnowledge TransferMaaS platform

0 likes · 22 min read

How Generative Data‑Driven Model Distillation Boosts Large‑Model Performance and Cuts Compute

JD Retail Technology

Jun 18, 2025 · Artificial Intelligence

How JD’s Tech Teams Power 618: AI, Logistics, and Voice Innovations

The article explores how JD’s engineers across retail, logistics, and AI divisions use model distillation, data selection, intelligent routing, and advanced voice recognition to improve the 618 shopping festival experience, highlighting real‑world technical challenges, solutions, and the company’s talent development programs.

AILogisticsdata engineering

0 likes · 16 min read

How JD’s Tech Teams Power 618: AI, Logistics, and Voice Innovations

JD Tech Talk

May 22, 2025 · Artificial Intelligence

From Academic Research to Industrial Anti‑Fraud: Leveraging LLMs, Reinforcement Learning, and Model Distillation for Advertising Risk Detection

The article recounts Xiaoting’s journey from a PhD research background to leading JD.com’s ad‑fraud detection, detailing how large language models, reinforcement learning, and model distillation were applied to identify hidden address codes, reduce false‑positive rates to 0.3%, and balance accuracy with real‑time performance in a high‑traffic e‑commerce environment.

AIAd FraudAdvertising

0 likes · 11 min read

From Academic Research to Industrial Anti‑Fraud: Leveraging LLMs, Reinforcement Learning, and Model Distillation for Advertising Risk Detection

JD Cloud Developers

May 22, 2025 · Artificial Intelligence

How AI and LLMs Power JD’s Real-Time Advertising Anti‑Fraud System

This article recounts a JD researcher’s journey from academic data‑mining competitions to building an AI‑driven, LLM‑enhanced anti‑fraud platform that balances detection accuracy, computational cost, and business value in large‑scale e‑commerce advertising.

AILLMadvertising fraud

0 likes · 11 min read

How AI and LLMs Power JD’s Real-Time Advertising Anti‑Fraud System

JD Retail Technology

May 22, 2025 · Industry Insights

Cracking Hidden Ad Fraud: JD’s AI‑Driven Anti‑Cheat System Explained

This article recounts the journey of a JD PhD trainee who transformed academic research on anomaly detection into a production‑grade, LLM‑enhanced anti‑fraud system that identifies concealed address codes in CPS ads, detailing model design, LoRA fine‑tuning, reinforcement learning, distillation, cost‑aware deployment, and lessons learned for scalable ad risk management.

ad fraud detectionindustry AIlarge language model

0 likes · 12 min read

Cracking Hidden Ad Fraud: JD’s AI‑Driven Anti‑Cheat System Explained

Data Thinking Notes

May 19, 2025 · Artificial Intelligence

How Model Distillation Shrinks Giant AI Models Without Losing Performance

This article explains model distillation—a technique that transfers knowledge from large teacher models to compact student models—covering its motivation, core principles, key steps, practical applications, and both its advantages and limitations, illustrating how AI can be made efficient without sacrificing performance.

AI compressionKnowledge Transfermodel distillation

0 likes · 10 min read

How Model Distillation Shrinks Giant AI Models Without Losing Performance

JD Retail Technology

May 19, 2025 · Artificial Intelligence

How JD’s Omniforce Boosts Large Model Efficiency with Cloud‑Edge Collaboration

The JD Exploration Institute paper introduces Omniforce, a human‑centered, cloud‑edge collaborative AutoML system that uses model distillation, dynamic data governance, Bayesian‑optimized training, and edge deployment to cut large‑model training costs by 70% and improve inference speed by 30%, powering the JoyBuild platform for broader AI adoption.

AI efficiencyAutoMLJoyBuild

0 likes · 6 min read

How JD’s Omniforce Boosts Large Model Efficiency with Cloud‑Edge Collaboration

JD Tech

May 15, 2025 · Artificial Intelligence

How JD’s Omniforce Cuts Large‑Model Training Cost by 70% and Boosts Inference Speed 30%

The paper "Omniforce" from JD Exploration Research Institute presents a cloud‑edge collaborative AutoML system that uses model distillation, data governance, Bayesian training optimization, and cloud‑edge cooperation to reduce large‑model training costs by 70% and improve inference efficiency by an average of 30%, offering a reusable technical paradigm for scalable AI deployment.

AI efficiencyJoyBuildLarge Model

0 likes · 6 min read

How JD’s Omniforce Cuts Large‑Model Training Cost by 70% and Boosts Inference Speed 30%

Architect's Guide

May 13, 2025 · Artificial Intelligence

DeepSeek Model Distillation Technology: Overview, Innovations, Architecture, Training, Performance, and Challenges

This article provides a comprehensive overview of DeepSeek's model distillation technology, detailing its definition, key innovations, architecture, training methods, performance gains, and the remaining challenges such as the implicit performance ceiling and multimodal data distillation.

AI OptimizationDeepSeekKnowledge Transfer

0 likes · 14 min read

DeepSeek Model Distillation Technology: Overview, Innovations, Architecture, Training, Performance, and Challenges

Alibaba Cloud Big Data AI Platform

Apr 22, 2025 · Artificial Intelligence

How DistilQwen2.5-DS3-0324 Achieves Fast, Accurate Reasoning via Quick‑Think Distillation

This article introduces DistilQwen2.5-DS3-0324, a distilled language model series that balances rapid inference with strong reasoning by applying a fast‑thinking chain‑of‑thought strategy, details its two‑stage distillation framework, evaluation on diverse benchmarks, and provides code for downloading and using the models.

Deep Learningchain-of-thoughtfast inference

0 likes · 17 min read

How DistilQwen2.5-DS3-0324 Achieves Fast, Accurate Reasoning via Quick‑Think Distillation

Baidu Geek Talk

Apr 9, 2025 · Artificial Intelligence

Baidu's Wenxin X1 Large Model Officially Launches on Qianfan Platform

On April 2, Baidu released its Wenxin X1 large model on the Qianfan platform, offering enterprise users and developers a multimodal, deep‑thinking AI with superior math, coding, and reasoning scores, low token‑price API access, batch inference, one‑click distillation, and rapid RAG/Agent application building.

AIAPI ServiceBaidu

0 likes · 4 min read

Baidu's Wenxin X1 Large Model Officially Launches on Qianfan Platform

Architect

Mar 3, 2025 · Artificial Intelligence

Unlocking Reasoning LLMs: Methods, DeepSeek R1 Insights, and Cost‑Effective Strategies

This article examines how to build and improve reasoning‑capable large language models, explains the definition and use‑cases of reasoning models, details DeepSeek‑R1’s training pipeline, compares four key enhancement methods—including inference‑time scaling, pure RL, SFT + RL, and distillation—and offers budget‑friendly advice.

AI researchDeepSeekInference Scaling

0 likes · 27 min read

Unlocking Reasoning LLMs: Methods, DeepSeek R1 Insights, and Cost‑Effective Strategies

DataFunSummit

Feb 25, 2025 · Artificial Intelligence

Tiny‑R1‑32B‑Preview: A 5% Parameter Model Matching Deepseek‑R1‑671B Performance

On February 24, 2025, 360 and Peking University unveiled Tiny‑R1‑32B‑Preview, a medium‑scale inference model that uses only 5% of the parameters yet achieves performance comparable to the 671‑billion‑parameter Deepseek‑R1, with leading results on math, programming, and scientific benchmarks.

AI modelBenchmarkingTiny-R1

0 likes · 7 min read

Tiny‑R1‑32B‑Preview: A 5% Parameter Model Matching Deepseek‑R1‑671B Performance

Tencent Technical Engineering

Feb 21, 2025 · Artificial Intelligence

DeepSeek-R1: Enhancing Reasoning Capabilities in LLMs via Reinforcement Learning

DeepSeek‑R1 demonstrates that large‑scale reinforcement learning, especially with the novel Group Relative Policy Optimization and a rule‑based reward scheme, can markedly boost reasoning in LLMs without heavy supervised fine‑tuning, while a brief cold‑start SFT phase, two‑stage alignment, and knowledge distillation further improve performance and efficiency, despite remaining challenges such as language mixing.

DeepSeek-R1GRPOLLM Reasoning

0 likes · 21 min read

DeepSeek-R1: Enhancing Reasoning Capabilities in LLMs via Reinforcement Learning

Architects' Tech Alliance

Feb 18, 2025 · Artificial Intelligence

How DeepSeek’s Latest Models Redefine AI Performance and Industry Adoption

The DeepSeek report details rapid model releases from 2024 onward, highlighting innovations such as model distillation, a 671 B MoE architecture, FP8 mixed‑precision, and the Janus‑Pro multimodal framework, while also documenting major cloud and chip providers' integration of these models into their services.

AI industry adoptionDeepSeekMoE architecture

0 likes · 10 min read

How DeepSeek’s Latest Models Redefine AI Performance and Industry Adoption

Fun with Large Models

Feb 16, 2025 · Artificial Intelligence

Can You Claim to Know Large Models? Guide to Distillation, Quantization & Fine‑Tuning

This article explains why the massive DeepSeek V3/R1 model (671 B parameters) is hard to deploy and introduces three key techniques—model distillation, quantization, and fine‑tuning—that can shrink, accelerate, or specialize large models, while outlining their trade‑offs and practical steps.

AI model compressionDeepSeeklarge language models

0 likes · 10 min read

Can You Claim to Know Large Models? Guide to Distillation, Quantization & Fine‑Tuning

DataFunTalk

Feb 16, 2025 · Artificial Intelligence

Understanding Reasoning LLMs: DeepSeek R1 Variants, Inference‑Time Scaling, and Training Strategies

This article explains what reasoning language models are, outlines their strengths and weaknesses, details DeepSeek R1's three variants and their training pipelines—including pure reinforcement learning, SFT + RL, and distillation—while also discussing inference‑time scaling techniques and related research such as Sky‑T1 and TinyZero.

DeepSeekInference Scalingmodel distillation

0 likes · 16 min read

Understanding Reasoning LLMs: DeepSeek R1 Variants, Inference‑Time Scaling, and Training Strategies

Architects' Tech Alliance

Feb 16, 2025 · Artificial Intelligence

How DeepSeek’s Distillation Breaks Bottlenecks and Boosts Multimodal AI Performance

This article provides an in‑depth technical analysis of DeepSeek’s model distillation technology, covering its core principles, innovative data‑model fusion strategies, architecture design, training optimizations, performance benchmarks, and the remaining challenges of scaling distillation to multimodal tasks.

AI OptimizationDeepSeeklarge language models

0 likes · 16 min read

How DeepSeek’s Distillation Breaks Bottlenecks and Boosts Multimodal AI Performance

Top Architect

Feb 14, 2025 · Artificial Intelligence

DeepSeek Model Distillation: Principles, Innovations, Architecture, and Performance

This article provides an in‑depth overview of DeepSeek’s model distillation technology, covering its definition, core principles, innovative data‑model distillation integration, architecture design, training strategies, performance gains, and the challenges of scaling to multimodal data.

AI OptimizationDeepSeekKnowledge Transfer

0 likes · 16 min read

DeepSeek Model Distillation: Principles, Innovations, Architecture, and Performance

IT Architects Alliance

Feb 10, 2025 · Artificial Intelligence

DeepSeek Distillation Technology: Principles, Innovations, Performance, and Future Outlook

The article explains DeepSeek's model distillation technique, covering its fundamental knowledge‑transfer principles, unique innovations such as data‑model fusion and task‑specific strategies, impressive benchmark results, practical applications in edge and online inference, existing challenges, and future research directions.

AI OptimizationDeep LearningEdge Computing

0 likes · 15 min read

DeepSeek Distillation Technology: Principles, Innovations, Performance, and Future Outlook

Architect

Feb 9, 2025 · Artificial Intelligence

How DeepSeek’s Model Distillation Boosts AI Efficiency and Performance

This article provides an in‑depth analysis of DeepSeek’s model distillation technology, covering its definition, core principles, innovative strategies, architecture design, training optimizations, benchmark results, efficiency gains, and the remaining challenges of applying distillation to large language models and multimodal data.

AI efficiencyDeepSeekKnowledge Transfer

0 likes · 16 min read

How DeepSeek’s Model Distillation Boosts AI Efficiency and Performance

Architecture Digest

Feb 7, 2025 · Artificial Intelligence

Open-Source Replication of OpenAI’s o1 Model Achieves Superior Performance with Minimal Cost

A recent study by Fei‑Fei Li’s team shows that using supervised fine‑tuning on the open‑source Qwen2.5‑32B‑Instruct model can replicate and even surpass the reasoning abilities of OpenAI’s o1‑preview at a fraction of the computational cost, demonstrating a cheap yet powerful approach to large‑language‑model development.

Supervised Fine‑Tuningbudget-forcingcost-effective-ai

0 likes · 6 min read

Open-Source Replication of OpenAI’s o1 Model Achieves Superior Performance with Minimal Cost

Radish, Keep Going!

Feb 4, 2025 · Artificial Intelligence

How DeepSeek Is Redefining AI: Efficiency, Open‑Source Impact, and Future Trends

The article reviews DeepSeek's breakthrough in inference efficiency, explores the trade‑offs of model distillation, compares open‑source and closed‑source ecosystems, examines shifting compute demands, highlights Chinese engineering innovations, and outlines future directions for AI development.

AI inferenceDeepSeekMultimodal AI

0 likes · 9 min read

How DeepSeek Is Redefining AI: Efficiency, Open‑Source Impact, and Future Trends

Architect

Feb 3, 2025 · Artificial Intelligence

How DeepSeek‑R1 Uses Pure Reinforcement Learning to Match OpenAI’s o1

This article presents DeepSeek‑R1 and DeepSeek‑R1‑Zero, two next‑generation LLMs trained with pure reinforcement learning and multi‑stage fine‑tuning, details their GRPO training framework, model‑distillation pipeline, open‑source release, and evaluation results that rival OpenAI’s o1‑1217 across reasoning, knowledge, and coding benchmarks.

DeepSeekLLM evaluationOpenAI o1

0 likes · 10 min read

How DeepSeek‑R1 Uses Pure Reinforcement Learning to Match OpenAI’s o1

IT Services Circle

Feb 2, 2025 · Artificial Intelligence

OpenAI and Anthropic Accuse DeepSeek of Model Distillation and IP Infringement: Industry Reactions and Technical Overview

OpenAI and Anthropic allege that DeepSeek has illegally distilled their large language models, prompting investigations, industry satire, and a detailed look at model distillation technology, its legal implications, and the broader trends shaping AI cost, scaling laws, and market dynamics.

AI ethicsDeepSeekOpenAI

0 likes · 10 min read

OpenAI and Anthropic Accuse DeepSeek of Model Distillation and IP Infringement: Industry Reactions and Technical Overview

Baobao Algorithm Notes

Jan 22, 2025 · Artificial Intelligence

Can RL‑Only Training Make LLMs Beat OpenAI‑o1? Inside DeepSeek‑R1’s Architecture and Results

DeepSeek‑R1’s open‑source series demonstrates that reinforcement‑learning‑only training can match top‑tier models like OpenAI‑o1, while a small amount of SFT further improves readability; the article dissects its technical report, training pipeline, reward design, distillation strategy, benchmark outcomes, and remaining challenges.

DeepSeekSupervised Fine‑Tuninglarge language model

0 likes · 11 min read

Can RL‑Only Training Make LLMs Beat OpenAI‑o1? Inside DeepSeek‑R1’s Architecture and Results

Alibaba Cloud Big Data AI Platform

Nov 20, 2024 · Artificial Intelligence

How to Efficiently Deploy and Fine‑Tune DistilQwen2 on Alibaba Cloud PAI

This guide walks you through the full workflow of using DistilQwen2 on Alibaba Cloud's PAI platform, covering environment setup, model deployment, fine‑tuning with SFT/DPO, evaluation, compression, and distillation, while providing practical code snippets and resource links.

AI deploymentDistilQwen2PAI-QuickStart

0 likes · 17 min read

How to Efficiently Deploy and Fine‑Tune DistilQwen2 on Alibaba Cloud PAI

Baobao Algorithm Notes

Sep 5, 2024 · Artificial Intelligence

Why Small LLMs Are the Secret Weapon for Scaling Large Model Research

The article explains how homologous small language models—trained on the same tokenizer and data as their large counterparts—serve as cheap, fast experimental platforms that can predict large‑model performance, guide pre‑training decisions, and support techniques like distillation and reward modeling.

AI researchLLM scalingQwen2

0 likes · 13 min read

Why Small LLMs Are the Secret Weapon for Scaling Large Model Research

Baobao Algorithm Notes

Jul 25, 2024 · Artificial Intelligence

Why LLaMA 3 405B Matches GPT‑4o: Architecture, Training, and Industry Impact

The article provides an in‑depth analysis of LLaMA 3 405B, covering its dense Transformer architecture, three‑stage pre‑training (initial, long‑context, annealing), iterative post‑training with RM‑guided rejection sampling, the decision against MOE, and the broader implications for both large and small model development.

405BModel architecturemodel distillation

0 likes · 17 min read

Why LLaMA 3 405B Matches GPT‑4o: Architecture, Training, and Industry Impact

DataFunSummit

Jul 10, 2024 · Artificial Intelligence

Applying Large Language Models to Recommendation Systems at Ant Group

The article presents Ant Group's research on integrating large language models into recommendation pipelines, covering background challenges, knowledge extraction, teacher‑model distillation, efficient deployment, experimental results, and future directions to improve accuracy and reduce bias.

AILLMRecommendation Systems

0 likes · 13 min read

Applying Large Language Models to Recommendation Systems at Ant Group

Rare Earth Juejin Tech Community

May 1, 2024 · Artificial Intelligence

Hyper‑SD: Trajectory‑Segmented Consistency Model for Accelerating Diffusion Image Generation

Hyper‑SD introduces a trajectory‑segmented consistency distillation framework that combines trajectory‑preserving and trajectory‑reconstruction strategies, integrates human‑feedback learning and score distillation, and achieves state‑of‑the‑art low‑step image generation performance on both SD1.5 and SDXL models.

AI accelerationRLHFdiffusion models

0 likes · 10 min read

Hyper‑SD: Trajectory‑Segmented Consistency Model for Accelerating Diffusion Image Generation

Volcano Engine Developer Services

Mar 7, 2024 · Artificial Intelligence

How SDXL‑Lightning Generates High‑Quality Images in Just 2 Steps

SDXL‑Lightning, a new diffusion‑based text‑to‑image model from ByteDance, uses Progressive Adversarial Distillation to cut inference steps to as few as 2 while maintaining high resolution and fidelity, offering ten‑fold speed gains, open‑source access, and compatibility with SDXL, ControlNet, and ComfyUI.

AI accelerationdiffusionmodel distillation

0 likes · 8 min read

How SDXL‑Lightning Generates High‑Quality Images in Just 2 Steps

Baidu Tech Salon

Nov 10, 2023 · Artificial Intelligence

Baidu Search Deep Learning Model Architecture and Optimization Practices

Baidu's Search Architecture team details how its deep‑learning models have evolved to deliver direct answer results via semantic embeddings, describes a massive online inference pipeline that rewrites queries, ranks relevance, and classifies types, and outlines optimization techniques—including data I/O, CPU/GPU balancing, pruning, quantization, and distillation—to achieve high‑throughput, low‑latency search.

BaiduGPU OptimizationInference System

0 likes · 13 min read

Baidu Search Deep Learning Model Architecture and Optimization Practices

Baidu Geek Talk

Nov 9, 2023 · Artificial Intelligence

Deep Learning Model Architecture Evolution in Baidu Search

The article chronicles Baidu Search’s Model Architecture Group’s evolution of deep‑learning‑driven search, detailing the shift from inverted‑index to semantic vector indexing, the use of transformer‑based models for text and image queries, large‑scale offline/online pipelines, and extensive GPU‑centric optimizations such as pruning, quantization and distillation, all aimed at delivering precise, cost‑effective results to hundreds of millions of users.

ErnieGPU inferenceModel Optimization

0 likes · 14 min read

Deep Learning Model Architecture Evolution in Baidu Search

DataFunTalk

Oct 2, 2023 · Artificial Intelligence

DAMO-YOLO: A High‑Efficiency, High‑Accuracy Object Detection Framework

DAMO‑YOLO is an open‑source, high‑speed and high‑precision object detection framework that leverages MAE‑NAS for low‑cost model customization, Efficient RepGFPN and HeavyNeck for enhanced multi‑scale detection, and a universal distillation technique to boost performance across model scales.

Efficient RepGFPNMAE-NASYOLO

0 likes · 15 min read

DAMO-YOLO: A High‑Efficiency, High‑Accuracy Object Detection Framework

DataFunTalk

Jan 18, 2023 · Artificial Intelligence

Search Relevance System Architecture and Practices in QQ Browser

This article presents the QQ Browser search relevance team's experience integrating QQ Browser and Sogou search systems, detailing business overview, relevance system evolution, algorithm architecture, evaluation metrics, deep semantic matching, relevance calibration, and model distillation techniques to improve search relevance performance.

Evaluation Metricsinformation retrievalmodel distillation

0 likes · 31 min read

Search Relevance System Architecture and Practices in QQ Browser

Ctrip Technology

Nov 10, 2022 · Artificial Intelligence

Improving Search Intent Recognition and Term Weighting with Deep Learning and Model Distillation at Ctrip

This article describes how Ctrip's R&D team applied deep‑learning models, BERT‑based embeddings, knowledge distillation, and term‑weighting techniques to enhance e‑commerce search intent recognition and term importance estimation, achieving high accuracy while meeting sub‑10 ms latency requirements.

BERTDeep LearningSearch

0 likes · 12 min read

Improving Search Intent Recognition and Term Weighting with Deep Learning and Model Distillation at Ctrip

Tencent Advertising Technology

Aug 16, 2022 · Artificial Intelligence

CONFLUX: A Request-level Fusion Framework for Impression Allocation via Cascade Distillation

The paper presents CONFLUX, a request-level fusion ranking framework that uses linear programming and cascade distillation to allocate ad impressions between contract and real-time bidding ads, improving platform revenue and ad effectiveness while addressing offline training, latency, and model drift challenges.

CONFLUXKDD 2022Linear Programming

0 likes · 14 min read

CONFLUX: A Request-level Fusion Framework for Impression Allocation via Cascade Distillation

Code DAO

Apr 24, 2022 · Artificial Intelligence

How Transfer Learning Accelerates Deep Learning Across Vision, NLP, and Reinforcement Learning

The article explains how transfer learning reduces data and time requirements in deep learning by reusing pretrained models for vision, natural language processing, and reinforcement learning, while discussing challenges such as overfitting, the need for progressive networks, entropy regularization, domain adaptation, multi‑task learning, and model distillation.

Deep Learningdomain adaptationmodel distillation

0 likes · 10 min read

How Transfer Learning Accelerates Deep Learning Across Vision, NLP, and Reinforcement Learning

DataFunTalk

Mar 8, 2021 · Artificial Intelligence

Recommendation System Architecture and Coarse Ranking Design for Tencent Music's Quanmin K‑Song

This article details the business background, system architecture, coarse‑ranking algorithms, dual‑tower model, model distillation, diversity‑control techniques such as DPP, and the online performance gains of the recommendation pipeline used in Tencent Music's Quanmin K‑Song platform.

AIDPPcoarse ranking

0 likes · 24 min read

Recommendation System Architecture and Coarse Ranking Design for Tencent Music's Quanmin K‑Song

JD Cloud Developers

Feb 19, 2021 · Artificial Intelligence

How FastReID V1.0 Revolutionizes General Object Re‑Identification

FastReID, an open‑source PyTorch library from JD AI Research, offers a modular architecture, model distillation, automatic hyper‑parameter search, and multi‑task support, enabling efficient large‑scale object re‑identification across diverse applications such as security, retail, and smart infrastructure.

Re-identificationhyperparameter optimizationmodel distillation

0 likes · 12 min read

How FastReID V1.0 Revolutionizes General Object Re‑Identification

Meituan Technology Team

Jul 23, 2020 · Artificial Intelligence

Named Entity Recognition in O2O Search: Background, Technical Choices, and Practical Practices

Meituan’s O2O search relies on a hybrid NER system that combines high‑precision domain dictionaries with BERT‑based models scored by a CRF, built from multi‑source offline mining, accelerated via operator fusion, batching and mixed‑precision, and further enhanced by lattice‑LSTM, knowledge‑infused stages and weak‑supervision, delivering millisecond‑level latency and over‑90% recall.

Dictionary MatchingKnowledge-EnhancedNER

0 likes · 30 min read