Tagged articles
40 articles
Page 1 of 1
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
May 17, 2026 · Artificial Intelligence

How to Build Agentic Factual SFT and Mid‑Train Datasets: Query Selection, Trajectory Generation, and Tool Usage

This article outlines a systematic approach for creating agentic factual SFT and Mid‑train data, covering the definition of training goals, query filtering, two‑layer classification and labeling, trajectory format, differences between Mid‑train and SFT, a practical synthesis pipeline, and common pitfalls to avoid.

Agentic AISFTdata synthesis
0 likes · 11 min read
How to Build Agentic Factual SFT and Mid‑Train Datasets: Query Selection, Trajectory Generation, and Tool Usage
SuanNi
SuanNi
May 5, 2026 · Artificial Intelligence

Why Making AI Warm Leads to More Hallucinations – Insights from a Nature Study

A systematic experiment by the Oxford Internet Institute shows that adding a friendly, empathetic personality to large language models via supervised fine‑tuning dramatically raises factual error rates—especially under emotional prompts—while cold, concise tuning leaves accuracy intact.

AI SafetyNature studySFT
0 likes · 9 min read
Why Making AI Warm Leads to More Hallucinations – Insights from a Nature Study
360 Tech Engineering
360 Tech Engineering
Apr 28, 2026 · Artificial Intelligence

How 360 AI Institute Boosted Airline Translation Accuracy from 70% to 96%

The 360 AI Research Institute tackled the zero‑tolerance translation demands of airline maintenance by building a specialized parallel corpus and applying RAG‑enhanced, SFT‑fine‑tuned, and RL‑reinforced models, raising Chinese‑to‑English translation accuracy from 70% to 96% and enabling a one‑month rollout.

AI translationRAGSFT
0 likes · 5 min read
How 360 AI Institute Boosted Airline Translation Accuracy from 70% to 96%
Old Zhang's AI Learning
Old Zhang's AI Learning
Apr 26, 2026 · Artificial Intelligence

Distilling Claude Opus into Qwen3.6-27B – GGUF Lets You Run Locally on Consumer GPUs

The preview model Qwopus3.6-27B‑v1, distilled from Claude Opus onto Qwen3.6‑27B using SFT with the Unsloth stack and a curated 12 K high‑quality inference sample set, is evaluated on agentic reasoning, front‑end design, and Canvas/WebGL tasks with an RTX 5090, and can be deployed locally via llama.cpp GGUF quantizations with detailed memory guidelines.

Apache 2.0Claude OpusGGUF
0 likes · 7 min read
Distilling Claude Opus into Qwen3.6-27B – GGUF Lets You Run Locally on Consumer GPUs
CodeTrend
CodeTrend
Apr 24, 2026 · Artificial Intelligence

How Large Language Models Acquire Tool‑Calling Ability: SFT, RLHF & LoRA Explained

The article explains why pretrained LLMs cannot call tools, then breaks down the three‑stage training pipeline—Supervised Fine‑Tuning, Reinforcement Learning from Human Feedback, and knowledge distillation—showing how each step teaches models to read tool schemas, decide when to invoke a tool, generate JSON calls, and finally transfer the capability to smaller models with LoRA.

AI trainingFunction CallingLLM
0 likes · 19 min read
How Large Language Models Acquire Tool‑Calling Ability: SFT, RLHF & LoRA Explained
Machine Heart
Machine Heart
Apr 22, 2026 · Artificial Intelligence

Can LLMs Boost Reasoning Alone? Introducing SePT’s Simple Online Self‑Training

SePT (Self‑evolving Post‑Training) shows that a large language model can improve its mathematical reasoning ability by about ten percentage points using a reward‑free online self‑training loop that decouples generation temperature from standard SFT, matching or surpassing RL‑based methods without harming general performance.

LLMMathematical ReasoningOnline Learning
0 likes · 9 min read
Can LLMs Boost Reasoning Alone? Introducing SePT’s Simple Online Self‑Training
Data Party THU
Data Party THU
Apr 12, 2026 · Artificial Intelligence

What’s Driving the Next Wave of LLM Post‑Training? A Deep Dive into SFT, RLHF, GRPO and Emerging Trends

This article systematically reviews the core post‑training techniques for large language models—including supervised fine‑tuning, RLHF, PPO, GRPO, DPO, RLVR and Agentic RL—explains their evolution, compares their trade‑offs, and highlights the most promising research directions for 2025‑2026.

AI AlignmentGRPOLLM
0 likes · 20 min read
What’s Driving the Next Wave of LLM Post‑Training? A Deep Dive into SFT, RLHF, GRPO and Emerging Trends
Machine Heart
Machine Heart
Apr 4, 2026 · Artificial Intelligence

SFT Scores Don’t Predict RL Potential: Adaptive Early‑Stop Loss for LLMs

The authors show that high SFT accuracy does not guarantee strong RL performance because over‑fitting reduces output diversity, and they propose Adaptive Early‑Stop Loss (AESL), a diversity‑aware early‑stopping objective that dynamically weights token and subsequence losses, yielding consistently better RL results on multiple LLMs and math benchmarks.

AESLDiversityLLM
0 likes · 11 min read
SFT Scores Don’t Predict RL Potential: Adaptive Early‑Stop Loss for LLMs
AI Engineer Programming
AI Engineer Programming
Mar 28, 2026 · Artificial Intelligence

How to Start Training Your Own AI Model: A Complete Roadmap

This guide maps the end-to-end process for building a small AI model—from leveraging open-source base models and applying SFT with LoRA/QLoRA, through alignment techniques like DPO or ORPO, to low-cost distillation and final quantization for local deployment, while recommending free GPU resources and essential tooling.

AIAlignmentDistillation
0 likes · 12 min read
How to Start Training Your Own AI Model: A Complete Roadmap
Baobao Algorithm Notes
Baobao Algorithm Notes
Dec 7, 2025 · Artificial Intelligence

Can RL Really Boost LLM Reasoning? A Critical Review of Recent Findings

This article critically examines recent RL‑for‑LLM studies, revealing that reinforcement learning improves search efficiency but does not extend the intrinsic reasoning capabilities of base models, and explores the underlying model‑conditioned optimization bias, comparisons with SFT distillation, and the trade‑off with catastrophic forgetting.

Catastrophic ForgettingLLMModel Optimization
0 likes · 11 min read
Can RL Really Boost LLM Reasoning? A Critical Review of Recent Findings
Amap Tech
Amap Tech
Nov 19, 2025 · Artificial Intelligence

How Gaode’s Spacetime‑GR Model Boosts POI Recommendation with AI‑Powered SFT and DPO

Gaode transforms its map app into a dynamic, AI‑driven “living map” by fine‑tuning the large Spacetime‑GR model through embedding‑based and generative ranking SFT, DPO alignment, and multimodal augmentation, achieving significant offline CTR‑AUC improvements and online CTR gains in POI recommendation.

AI recommendationDPOSFT
0 likes · 12 min read
How Gaode’s Spacetime‑GR Model Boosts POI Recommendation with AI‑Powered SFT and DPO
Baobao Algorithm Notes
Baobao Algorithm Notes
Oct 30, 2025 · Artificial Intelligence

Why LLM RL Training Crashes While SFT Stays Stable: Insights & Tricks

The article examines the fundamental similarity between SFT and RL loss functions for large language models, explains why RL training is prone to instability, discusses infrastructure and data quality challenges, and reviews practical tricks and reward‑model considerations for more reliable RL fine‑tuning.

AILLMReward Modeling
0 likes · 11 min read
Why LLM RL Training Crashes While SFT Stays Stable: Insights & Tricks
Data Party THU
Data Party THU
Sep 15, 2025 · Artificial Intelligence

Why Merge SFT and RL? Exploring Unified Fine‑Tuning Strategies for LLMs

This article examines the necessity of integrating Supervised Fine‑Tuning (SFT) with Reinforcement Learning (RL) for large language models, surveys alternating, sample‑reuse, simultaneous, and hint‑guided fusion methods, presents the underlying loss functions, and discusses practical trade‑offs such as entropy collapse and importance‑sampling corrections.

AILLMRL
0 likes · 14 min read
Why Merge SFT and RL? Exploring Unified Fine‑Tuning Strategies for LLMs
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jul 23, 2025 · Artificial Intelligence

How to Distill Large Language Models for Efficient Text Generation with EasyDistill

This guide explains how to use the EasyDistill framework and Alibaba Cloud PAI to distill large language models for high‑quality text generation, covering model deployment, SFT and DPO training data construction, code examples, configuration files, and best practices for achieving resource‑efficient, high‑performance student models.

DPOEasyDistillPAI
0 likes · 14 min read
How to Distill Large Language Models for Efficient Text Generation with EasyDistill
DataFunTalk
DataFunTalk
Jun 29, 2025 · Artificial Intelligence

Large Models Boost Douyin User Experience: Expert Insights

In an interview at the DA Digital Intelligence Conference, ByteDance AI specialist Cai Conghuai explains how large language models, combined with techniques like SFT, DPO, and RAG, are reshaping Douyin's user‑experience signal detection, root‑cause analysis, and evaluation, while outlining future AI‑agent breakthroughs.

AIDPORAG
0 likes · 12 min read
Large Models Boost Douyin User Experience: Expert Insights
DataFunSummit
DataFunSummit
Jun 22, 2025 · Artificial Intelligence

How Vivo’s BlueHeart AI Assistant Optimizes Post‑Conversation Recommendations with LLMs

In a detailed interview, Vivo AI engineer Liang Tianan explains how the BlueHeart Small V assistant leverages large language models, multi‑stage recall, ranking, and reward‑model fine‑tuning (SFT/DPO) to generate high‑quality, diverse post‑dialogue recommendation items while balancing latency, cost, and evaluation challenges.

DPOLLMSFT
0 likes · 15 min read
How Vivo’s BlueHeart AI Assistant Optimizes Post‑Conversation Recommendations with LLMs
Bilibili Tech
Bilibili Tech
Jan 14, 2025 · Artificial Intelligence

Technical Practices and Productization of Intelligent Advertising Title Generation for Bilibili

We built an LLM‑powered system for Bilibili that automatically creates ad titles from user keywords, employing fluency, style, and quality classifiers, mixed domain data cleaning, and alignment methods such as SFT, DPO and KTO, resulting in a product that now generates about ten percent of daily titles and drives significant ad spend.

AI AlignmentAd Title GenerationBilibili
0 likes · 24 min read
Technical Practices and Productization of Intelligent Advertising Title Generation for Bilibili
Baobao Algorithm Notes
Baobao Algorithm Notes
Nov 13, 2024 · Artificial Intelligence

Why Cleaning SFT Data Is a Nightmare: Hidden JSON Formatting Pitfalls

Cleaning SFT data for LLMs is surprisingly complex, as subtle JSON formatting variations, inconsistent markdown wrappers, intent settings, and unit handling can cause model inconsistencies, requiring unified standards, careful prompt design, and extensive manual review to ensure reliable training outputs.

JSON formattingLLM data cleaningModel Training
0 likes · 8 min read
Why Cleaning SFT Data Is a Nightmare: Hidden JSON Formatting Pitfalls
Baobao Algorithm Notes
Baobao Algorithm Notes
Oct 7, 2024 · Artificial Intelligence

Mastering LLM Supervised Fine‑Tuning: Practical Tips, Data Strategies, and Debugging

This article provides a comprehensive, experience‑driven guide to supervised fine‑tuning (SFT) of large language models, covering special tokens, latency considerations, data diversity and production, training frameworks and hyper‑parameters, over‑/under‑fitting diagnostics, and evaluation metrics such as helpfulness, honesty, and harmlessness.

AILLMSFT
0 likes · 40 min read
Mastering LLM Supervised Fine‑Tuning: Practical Tips, Data Strategies, and Debugging
NewBeeNLP
NewBeeNLP
Sep 5, 2024 · Artificial Intelligence

Why RLHF Is Irreplaceable: Uncovering the Limits of SFT

The article analyzes why supervised fine‑tuning (SFT) cannot replace reinforcement learning from human feedback (RLHF), highlighting SFT's lack of negative feedback and backward‑looking capability, and explains how RLHF’s reward model addresses these fundamental shortcomings.

RLHFReward ModelingSFT
0 likes · 7 min read
Why RLHF Is Irreplaceable: Uncovering the Limits of SFT
NewBeeNLP
NewBeeNLP
Sep 3, 2024 · Industry Insights

Why Pre‑training Teams Boost New Engineers’ Skills Faster Than SFT Teams

The answer explains that joining a pre‑training team accelerates a newcomer’s engineering abilities through hands‑on work with large‑scale data pipelines, distributed training code, and debugging, while SFT teams focus mainly on data labeling, making pre‑training the more effective path for rapid skill growth.

AIEngineering SkillsSFT
0 likes · 6 min read
Why Pre‑training Teams Boost New Engineers’ Skills Faster Than SFT Teams
Baobao Algorithm Notes
Baobao Algorithm Notes
Aug 29, 2024 · Artificial Intelligence

Why RLHF Is Essential: The Limits of SFT and the Power of Reward Modeling

The article analyzes why Reinforcement Learning from Human Feedback (RLHF) cannot be replaced by Supervised Fine‑Tuning (SFT), highlighting SFT's lack of negative feedback, its one‑directional attention limitation, and how RLHF's reward models provide crucial safety and performance improvements for large language models.

AI AlignmentRLHFSFT
0 likes · 9 min read
Why RLHF Is Essential: The Limits of SFT and the Power of Reward Modeling
Baobao Algorithm Notes
Baobao Algorithm Notes
Aug 29, 2024 · Industry Insights

Why Pretraining Boosts New Engineers More Than SFT: A Practical Guide

The answer argues that fresh graduates should join pre‑training teams because the required engineering tasks—large‑scale data crawling, Hadoop/Spark pipelines, torch and CUDA setup, megatron code debugging, and scaling‑law experiments—rapidly sharpen coding skills, while SFT work focuses mainly on data labeling and offers slower technical growth.

AI EngineeringSFTSkill development
0 likes · 7 min read
Why Pretraining Boosts New Engineers More Than SFT: A Practical Guide
Alibaba Cloud Developer
Alibaba Cloud Developer
Jul 22, 2024 · Artificial Intelligence

How Alibaba’s Logistics AI Overcame B2B Large Model Challenges

Alibaba’s logistics AI team shares their year‑long journey building a vertical‑domain large language model for logistics, detailing model alignment, Text2API, RAG, SFT techniques, challenges like accuracy and knowledge‑base maintenance, and showcasing real‑world applications such as chatbots, DingTalk assistants, and custom AI assistants.

Model AlignmentRAGSFT
0 likes · 16 min read
How Alibaba’s Logistics AI Overcame B2B Large Model Challenges
DaTaobao Tech
DaTaobao Tech
Jul 19, 2024 · Artificial Intelligence

Practices and Techniques for Vertical Domain Large Language Models

Vertical domain large language models, fine‑tuned on specialized data, deliver higher expertise and task performance, but require continual knowledge updates and careful alignment; techniques such as BPO‑guided instruction tuning (+1.8% accuracy), Reflexion‑based Text2API (+4% API correctness), advanced RAG preprocessing, and SFT combined with ORPO (+5.2% gain) demonstrate notable improvements while underscoring remaining challenges and collaborative opportunities.

AIAlignmentRAG
0 likes · 9 min read
Practices and Techniques for Vertical Domain Large Language Models
Baidu App Technology
Baidu App Technology
May 8, 2024 · Artificial Intelligence

How AI Can Auto‑Generate Standardized Git Commit Messages

This article details the design, implementation, and evaluation of an AI‑powered tool that automatically creates compliant Git commit messages by leveraging large language models, custom plugins, and performance‑focused optimizations to improve developer productivity and commit quality.

AIGitLLM
0 likes · 16 min read
How AI Can Auto‑Generate Standardized Git Commit Messages
DataFunTalk
DataFunTalk
Apr 29, 2024 · Artificial Intelligence

Practical Experience and Q&A Exploration of Patent Large Models

This article presents a comprehensive overview of the development, training, data preparation, algorithmic strategies, evaluation methods, and RAG integration for a domain‑specific patent large language model, highlighting challenges, practical results, and future research directions.

Domain-specific ModelPatent AIRAG
0 likes · 19 min read
Practical Experience and Q&A Exploration of Patent Large Models
NewBeeNLP
NewBeeNLP
Feb 22, 2024 · Artificial Intelligence

Practical Tips for CPT, SFT, and LoRA in Large Language Model Fine‑Tuning

This article shares hands‑on guidance on using continual pre‑training (CPT), supervised fine‑tuning (SFT), and LoRA adapters for large language models, covering dataset size requirements, learning‑rate scheduling, warm‑up ratios, epoch strategies, and practical routing choices based on real‑world experiments.

CPTLLM fine-tuningLoRA
0 likes · 12 min read
Practical Tips for CPT, SFT, and LoRA in Large Language Model Fine‑Tuning
Baobao Algorithm Notes
Baobao Algorithm Notes
Oct 25, 2023 · Artificial Intelligence

How Mixed Data Shapes LLaMA SFT: Scaling Trends, Conflict Zones, and the DMT Remedy

This article investigates how mixing data from mathematical reasoning, code generation, and general instruction-following tasks influences supervised fine‑tuning of LLaMA models, revealing distinct scaling curves, resource‑dependent performance conflicts, and a two‑stage DMT strategy that mitigates catastrophic forgetting while boosting overall capability.

DMT StrategyLLaMAModel Fine‑tuning
0 likes · 14 min read
How Mixed Data Shapes LLaMA SFT: Scaling Trends, Conflict Zones, and the DMT Remedy
Baobao Algorithm Notes
Baobao Algorithm Notes
Aug 18, 2023 · Artificial Intelligence

Unlocking Domain-Specific Large Model Training: Proven Tricks and Pitfalls

This article shares practical techniques for domain‑specific large model continue pre‑training, including data selection, mixing ratios with general data, multi‑task instruction pre‑training, resource‑aware fine‑tuning strategies, evaluation set design, vocabulary considerations, and deployment constraints for 7‑13B models.

AI researchModel EvaluationSFT
0 likes · 9 min read
Unlocking Domain-Specific Large Model Training: Proven Tricks and Pitfalls
Baobao Algorithm Notes
Baobao Algorithm Notes
Jul 19, 2023 · Artificial Intelligence

Llama 2’s Breakthroughs: Architecture, Data, and Training Tricks Explained

Llama 2 advances open‑source large‑model research by expanding context length to 4096, adopting GQA attention, scaling training data to 2 trillion tokens, and introducing refined SFT and RLHF techniques such as Ghost Attention, margin‑based reward modeling, and iterative rejection sampling, all detailed in Meta’s 76‑page report.

Llama-2RLHFSFT
0 likes · 8 min read
Llama 2’s Breakthroughs: Architecture, Data, and Training Tricks Explained