AI FinOps 2.0 — Curated Series · 100 articles

Collection size

100 articles

Page 1 of 5

Dec 3, 2025 · Artificial Intelligence

How Model Distillation Enhances LLM Performance on the TLM Platform

This article explains the TLM large‑model development platform and details how knowledge distillation—using soft labels, temperature scaling, and combined loss functions—compresses teacher models into efficient student models, with practical steps and evaluation on the platform.

AILLMTLM platform

0 likes · 5 min read

How Model Distillation Enhances LLM Performance on the TLM Platform

Old Zhang's AI Learning

Mar 12, 2026 · Artificial Intelligence

Distilling Claude Opus 4.6 into Qwen3.5‑27B: High‑Quality Reasoning on a Single RTX 3090

The article details how Claude Opus 4.6's chain‑of‑thought data were used to distill the 27‑billion‑parameter Qwen3.5‑27B model with Unsloth and LoRA, achieving full‑context inference on a single RTX 3090/4090, while outlining performance numbers, hyper‑parameter tips, benchmark gains and the trade‑offs of losing multimodal abilities.

Claude Opus 4.6GPU inferenceLoRA

0 likes · 7 min read

Distilling Claude Opus 4.6 into Qwen3.5‑27B: High‑Quality Reasoning on a Single RTX 3090

Alibaba Cloud Big Data AI Platform

Feb 25, 2025 · Artificial Intelligence

How DistilQwen2.5 Boosts LLM Efficiency with Dual‑Stage Knowledge Distillation

This article introduces DistilQwen2.5, a lightweight LLM series built on Qwen2.5 that uses a novel two‑layer distillation framework, instruction‑data optimization, and parameter‑fusion techniques to achieve higher performance while drastically reducing computational cost and deployment overhead.

Efficient InferenceLLMknowledge distillation

0 likes · 26 min read

How DistilQwen2.5 Boosts LLM Efficiency with Dual‑Stage Knowledge Distillation

Alibaba Cloud Big Data AI Platform

Apr 22, 2025 · Artificial Intelligence

How DistilQwen2.5-DS3-0324 Achieves Fast, Accurate Reasoning via Quick‑Think Distillation

This article introduces DistilQwen2.5-DS3-0324, a distilled language model series that balances rapid inference with strong reasoning by applying a fast‑thinking chain‑of‑thought strategy, details its two‑stage distillation framework, evaluation on diverse benchmarks, and provides code for downloading and using the models.

chain of thoughtdeep learningfast inference

0 likes · 17 min read

How DistilQwen2.5-DS3-0324 Achieves Fast, Accurate Reasoning via Quick‑Think Distillation

Alibaba Cloud Big Data AI Platform

Jun 30, 2025 · Artificial Intelligence

Unlocking Small LLM Power: Variable‑Length Chain Distillation with DistillQwen‑ThoughtY

This article introduces a variable‑length chain‑of‑thought distillation technique built on Alibaba Cloud PAI’s EasyDistill toolkit, presents the high‑quality OmniThought‑0528 dataset, details the training of the DistillQwen‑ThoughtY 4B/8B/32B models, and provides code and usage examples for researchers and practitioners.

DistillationLLMchain of thought

0 likes · 15 min read

Unlocking Small LLM Power: Variable‑Length Chain Distillation with DistillQwen‑ThoughtY

Baobao Algorithm Notes

Jun 3, 2025 · Artificial Intelligence

Can 1K Fine‑Tuning Replace 100K RL Steps? Insights from Re‑distillation Research

An extensive analysis shows that a 1K‑sample fine‑tuning stage can replicate the generalization gains of thousands of reinforcement‑learning steps, explains the compressibility of RL, introduces a sample‑effect theory, and demonstrates that re‑distillation and small‑scale SFT dramatically improve LLM performance.

Re-distillationSample Effectlarge language models

0 likes · 23 min read

Can 1K Fine‑Tuning Replace 100K RL Steps? Insights from Re‑distillation Research

Alibaba Cloud Big Data AI Platform

May 28, 2025 · Artificial Intelligence

How EasyDistill Simplifies LLM Knowledge Distillation for Faster, Smaller Models

EasyDistill, an open‑source toolkit from Alibaba Cloud AI Platform, streamlines knowledge distillation of large language models by offering modular data synthesis, black‑box and white‑box training, reinforcement‑learning and preference‑optimization techniques, enabling the creation of compact, high‑performance DistilQwen models and accompanying datasets.

DistilQwenEasyDistillknowledge distillation

0 likes · 17 min read

How EasyDistill Simplifies LLM Knowledge Distillation for Faster, Smaller Models

Meituan Technology Team

Jan 8, 2026 · Artificial Intelligence

Must‑Read AAAI 2026 Papers: Efficient Reasoning, Annealing, Multimodal Diffusion & More

This article curates eight AAAI 2026 papers authored by the Meituan research team, covering verifiable stepwise rewards for LLM reasoning, annealing strategies in large‑scale training, process reward models, competence‑difficulty sampling, high‑fidelity visual text rendering, counterfactual fusion, compress‑then‑rank reranking, and cross‑modal quantization for generative recommendation, with direct PDF links for each work.

AAAI2026CounterfactualLLM

0 likes · 14 min read

Must‑Read AAAI 2026 Papers: Efficient Reasoning, Annealing, Multimodal Diffusion & More

Baobao Algorithm Notes

Apr 27, 2025 · Artificial Intelligence

How DeepSeek R1T‑Chimera Cuts Tokens by 40% Without Fine‑Tuning

The DeepSeek‑R1T‑Chimera model merges DeepSeek‑R1 reasoning with V3‑0324 architecture, reusing most V3 weights and swapping only the blue‑highlighted R1 routing experts, achieving the same intelligence as R1 while reducing output tokens by about 40% and running faster, all without any fine‑tuning or distillation.

Artificial IntelligenceDeepSeekLLM

0 likes · 5 min read

How DeepSeek R1T‑Chimera Cuts Tokens by 40% Without Fine‑Tuning

Xiaohongshu Tech REDtech

Nov 4, 2025 · Artificial Intelligence

Unveiling the Law of Capacity Gap: Boosting Language Model Distillation Efficiency

At ACL 2025, a collaborative paper introduced the Law of Capacity Gap, revealing a linear 2.5× optimal teacher‑student size relationship in language model distillation, dramatically cutting compute costs and achieving Pareto‑optimal efficiency, with the MiniMA model as a successful demonstration.

DistillationMiniMAartificial-intelligence

0 likes · 7 min read

Unveiling the Law of Capacity Gap: Boosting Language Model Distillation Efficiency

AntTech

Jun 21, 2025 · Artificial Intelligence

Ring-lite: Open‑Source Lightweight MoE Model Sets SOTA on AIME and LiveCodeBench

Ring-lite, an open‑source lightweight Mixture‑of‑Experts inference model built on Ling‑lite‑1.5, introduces the C3PO reinforcement‑learning training method and achieves state‑of‑the‑art results on benchmarks such as AIME24/25, LiveCodeBench, CodeForce, and GPQA‑diamond, while offering full transparency of weights, code, and data.

AI inferenceC3PObenchmark

0 likes · 11 min read

Ring-lite: Open‑Source Lightweight MoE Model Sets SOTA on AIME and LiveCodeBench

Didi Tech

Mar 12, 2026 · Artificial Intelligence

How STAPO Improves Large‑Model Fine‑Tuning by Silencing Spurious Tokens

The STAPO (Spurious‑Token‑Aware Policy Optimization) algorithm, introduced by Tsinghua University's iDLab and Didi's Deep Sea Lab, tackles policy‑entropy instability and performance oscillation in reinforcement‑learning fine‑tuning of large models by mathematically analyzing token collision probability, defining spurious tokens, and applying a Silencing Spurious Tokens mechanism that yields state‑of‑the‑art results on multiple math‑reasoning benchmarks.

AI safetyFine-tuningSTAPO

0 likes · 7 min read

How STAPO Improves Large‑Model Fine‑Tuning by Silencing Spurious Tokens

Old Zhang's AI Learning

Apr 26, 2026 · Artificial Intelligence

Distilling Claude Opus into Qwen3.6-27B – GGUF Lets You Run Locally on Consumer GPUs

The preview model Qwopus3.6-27B‑v1, distilled from Claude Opus onto Qwen3.6‑27B using SFT with the Unsloth stack and a curated 12 K high‑quality inference sample set, is evaluated on agentic reasoning, front‑end design, and Canvas/WebGL tasks with an RTX 5090, and can be deployed locally via llama.cpp GGUF quantizations with detailed memory guidelines.

Apache 2.0Claude OpusGGUF

0 likes · 7 min read

Distilling Claude Opus into Qwen3.6-27B – GGUF Lets You Run Locally on Consumer GPUs

Alibaba Cloud Big Data AI Platform

Mar 29, 2025 · Artificial Intelligence

How DistilQwen2.5‑R1 Boosts Small‑Model Reasoning with Innovative Knowledge Distillation

The article introduces the DistilQwen2.5‑R1 series, which leverages a novel knowledge‑distillation pipeline—including CoT data evaluation, improvement, and validation—to transfer deep reasoning abilities from large models like DeepSeek‑R1 to compact models, achieving superior performance across math, code, and scientific benchmarks and providing open‑source checkpoints and deployment guides for practical use.

AI inferencebenchmark evaluationknowledge distillation

0 likes · 17 min read

How DistilQwen2.5‑R1 Boosts Small‑Model Reasoning with Innovative Knowledge Distillation

PaperAgent

Dec 11, 2025 · Artificial Intelligence

Which Small Language Model Wins After Fine‑Tuning? A Data‑Driven Benchmark

A comprehensive benchmark fine‑tunes twelve small language models on eight diverse tasks, compares them against a 120B teacher model, and reveals which models excel overall, which are most "plastic" for improvement, and how small models can rival much larger ones.

AIFine-tuningLLM

0 likes · 11 min read

Which Small Language Model Wins After Fine‑Tuning? A Data‑Driven Benchmark

AntTech

Apr 23, 2026 · Artificial Intelligence

Ling-2.6-flash: Faster Response, Stronger Execution, and Higher Token Efficiency for Agent Workloads

Ling-2.6-flash is a 104B‑parameter Instruct model that uses a mixed‑linear architecture and token‑efficiency optimizations to achieve up to 340 tokens/s inference speed, 4× higher throughput than comparable models, and ten‑fold lower token consumption on Agent benchmarks, while maintaining SOTA performance.

Agent OptimizationInference EfficiencyLLM

0 likes · 15 min read

Ling-2.6-flash: Faster Response, Stronger Execution, and Higher Token Efficiency for Agent Workloads

DataFunTalk

Oct 30, 2025 · Artificial Intelligence

How On-Policy Distillation Cuts LLM Training Cost by 90%

Thinking Machines Lab introduces On-Policy Distillation, a post‑training technique that matches reinforcement‑learning performance while reducing compute cost by up to tenfold, and demonstrates its effectiveness through extensive experiments on reasoning, personalization, and catastrophic‑forgetting mitigation.

Model EfficiencyOn-Policy Distillationknowledge distillation

0 likes · 15 min read

How On-Policy Distillation Cuts LLM Training Cost by 90%

Top Architect

Feb 14, 2025 · Artificial Intelligence

DeepSeek Model Distillation: Principles, Innovations, Architecture, and Performance

This article provides an in‑depth overview of DeepSeek’s model distillation technology, covering its definition, core principles, innovative data‑model distillation integration, architecture design, training strategies, performance gains, and the challenges of scaling to multimodal data.

AI OptimizationDeepSeekKnowledge Transfer

0 likes · 16 min read

DeepSeek Model Distillation: Principles, Innovations, Architecture, and Performance

Baobao Algorithm Notes

Jun 28, 2024 · Artificial Intelligence

What Makes Gemma 2 a Competitive Open‑Source LLM? Architecture, Training, and Evaluation Insights

The article provides a detailed technical overview of Gemma 2, covering its decoder‑only transformer design, novel attention mechanisms, logit soft‑capping, RMSNorm, knowledge‑distillation training on trillions of tokens, extensive pre‑training infrastructure, and benchmark evaluations that demonstrate its competitiveness against larger proprietary models.

AIGemma 2benchmark evaluation

0 likes · 14 min read

What Makes Gemma 2 a Competitive Open‑Source LLM? Architecture, Training, and Evaluation Insights

Alibaba Cloud Big Data AI Platform

Nov 5, 2024 · Artificial Intelligence

How DistilQwen2 Boosts LLM Performance with Knowledge Distillation

This article introduces DistilQwen2, a lightweight language model derived from Qwen2 via knowledge distillation, detailing its data collection, instruction‑data optimization, training strategies, extensive benchmark evaluations, and practical deployment guides for developers and enterprises.

AIInstruction Tuningknowledge distillation

0 likes · 21 min read

How DistilQwen2 Boosts LLM Performance with Knowledge Distillation