Tagged articles

SFT

44 articles · Page 1 of 1

Machine Learning Algorithms & Natural Language Processing

Jun 7, 2026 · Artificial Intelligence

AgentDoG 1.5: A Lightweight, Extensible Framework for Trajectory‑Level Agent Safety

AgentDoG 1.5 expands AI‑agent safety from final replies to complete execution trajectories, introducing the ATBench family for fine‑grained evaluation, a taxonomy‑guided DataEngine for high‑quality data generation, and demonstrating substantial safety gains in both SFT/RL training and online guardrail deployment with lightweight models.

AI safetyATBenchAgentDoG

0 likes · 14 min read

AgentDoG 1.5: A Lightweight, Extensible Framework for Trajectory‑Level Agent Safety

Machine Learning Algorithms & Natural Language Processing

May 28, 2026 · Artificial Intelligence

Synthesizing Agentic Factual SFT/Mid‑train Data: Query Filtering, Trajectory Generation, and Tool Usage

The article outlines a practical pipeline for creating agentic factual SFT and mid‑train datasets, covering how to define training goals, filter and classify queries, label processing tags, format trajectory samples, differentiate SFT from mid‑train data, and avoid common pitfalls when generating evidence‑driven AI training data.

Data SynthesisSFTagentic AI

0 likes · 10 min read

Synthesizing Agentic Factual SFT/Mid‑train Data: Query Filtering, Trajectory Generation, and Tool Usage

Machine Heart

May 18, 2026 · Artificial Intelligence

Dynamic Difficulty-Adaptive Training Gains Momentum: Huawei’s EDCO Cited at ICML 2026

EDCO, Huawei’s entropy‑based dynamic curriculum method, continuously selects the most uncertain samples for domain‑specific LLM fine‑tuning, achieving higher accuracy and more stable gradients across communication, medical, and legal tasks while cutting entropy‑estimation cost by over 80 %.

EDCORLFTSFT

0 likes · 11 min read

Dynamic Difficulty-Adaptive Training Gains Momentum: Huawei’s EDCO Cited at ICML 2026

Machine Learning Algorithms & Natural Language Processing

May 17, 2026 · Artificial Intelligence

How to Build Agentic Factual SFT and Mid‑Train Datasets: Query Selection, Trajectory Generation, and Tool Usage

This article outlines a systematic approach for creating agentic factual SFT and Mid‑train data, covering the definition of training goals, query filtering, two‑layer classification and labeling, trajectory format, differences between Mid‑train and SFT, a practical synthesis pipeline, and common pitfalls to avoid.

Data SynthesisSFTagentic AI

0 likes · 11 min read

How to Build Agentic Factual SFT and Mid‑Train Datasets: Query Selection, Trajectory Generation, and Tool Usage

Wu Shixiong's Large Model Academy

May 13, 2026 · Artificial Intelligence

How to Explain a Jump from 71% to 94% Tool‑Calling Accuracy in a JD Interview

The article walks through a JD interview scenario where a candidate explains how a tool‑calling accuracy metric rose from 71% to 94% by detailing the full SFT data‑engineering pipeline, teacher‑model trajectory generation, quality validation, evaluation methodology, and interview‑ready talking points.

Data EngineeringEvaluationFunction Calling

0 likes · 19 min read

How to Explain a Jump from 71% to 94% Tool‑Calling Accuracy in a JD Interview

SuanNi

May 5, 2026 · Artificial Intelligence

Why Making AI Warm Leads to More Hallucinations – Insights from a Nature Study

A systematic experiment by the Oxford Internet Institute shows that adding a friendly, empathetic personality to large language models via supervised fine‑tuning dramatically raises factual error rates—especially under emotional prompts—while cold, concise tuning leaves accuracy intact.

AI safetyHallucinationLarge Language Models

0 likes · 9 min read

Why Making AI Warm Leads to More Hallucinations – Insights from a Nature Study

360 Tech Engineering

Apr 28, 2026 · Artificial Intelligence

How 360 AI Institute Boosted Airline Translation Accuracy from 70% to 96%

The 360 AI Research Institute tackled the zero‑tolerance translation demands of airline maintenance by building a specialized parallel corpus and applying RAG‑enhanced, SFT‑fine‑tuned, and RL‑reinforced models, raising Chinese‑to‑English translation accuracy from 70% to 96% and enabling a one‑month rollout.

AI translationRAGSFT

0 likes · 5 min read

How 360 AI Institute Boosted Airline Translation Accuracy from 70% to 96%

Old Zhang's AI Learning

Apr 26, 2026 · Artificial Intelligence

Distilling Claude Opus into Qwen3.6-27B – GGUF Lets You Run Locally on Consumer GPUs

The preview model Qwopus3.6-27B‑v1, distilled from Claude Opus onto Qwen3.6‑27B using SFT with the Unsloth stack and a curated 12 K high‑quality inference sample set, is evaluated on agentic reasoning, front‑end design, and Canvas/WebGL tasks with an RTX 5090, and can be deployed locally via llama.cpp GGUF quantizations with detailed memory guidelines.

Apache 2.0Claude OpusGGUF

0 likes · 7 min read

Distilling Claude Opus into Qwen3.6-27B – GGUF Lets You Run Locally on Consumer GPUs

CodeTrend

Apr 24, 2026 · Artificial Intelligence

How Large Language Models Acquire Tool‑Calling Ability: SFT, RLHF & LoRA Explained

The article explains why pretrained LLMs cannot call tools, then breaks down the three‑stage training pipeline—Supervised Fine‑Tuning, Reinforcement Learning from Human Feedback, and knowledge distillation—showing how each step teaches models to read tool schemas, decide when to invoke a tool, generate JSON calls, and finally transfer the capability to smaller models with LoRA.

AI trainingFunction CallingKnowledge Distillation

0 likes · 19 min read

How Large Language Models Acquire Tool‑Calling Ability: SFT, RLHF & LoRA Explained

Machine Heart

Apr 22, 2026 · Artificial Intelligence

Can LLMs Boost Reasoning Alone? Introducing SePT’s Simple Online Self‑Training

SePT (Self‑evolving Post‑Training) shows that a large language model can improve its mathematical reasoning ability by about ten percentage points using a reward‑free online self‑training loop that decouples generation temperature from standard SFT, matching or surpassing RL‑based methods without harming general performance.

LLMSFTSePT

0 likes · 9 min read

Can LLMs Boost Reasoning Alone? Introducing SePT’s Simple Online Self‑Training

Data Party THU

Apr 12, 2026 · Artificial Intelligence

What’s Driving the Next Wave of LLM Post‑Training? A Deep Dive into SFT, RLHF, GRPO and Emerging Trends

This article systematically reviews the core post‑training techniques for large language models—including supervised fine‑tuning, RLHF, PPO, GRPO, DPO, RLVR and Agentic RL—explains their evolution, compares their trade‑offs, and highlights the most promising research directions for 2025‑2026.

AI alignmentGRPOLLM

0 likes · 20 min read

What’s Driving the Next Wave of LLM Post‑Training? A Deep Dive into SFT, RLHF, GRPO and Emerging Trends

Machine Heart

Apr 4, 2026 · Artificial Intelligence

SFT Scores Don’t Predict RL Potential: Adaptive Early‑Stop Loss for LLMs

The authors show that high SFT accuracy does not guarantee strong RL performance because over‑fitting reduces output diversity, and they propose Adaptive Early‑Stop Loss (AESL), a diversity‑aware early‑stopping objective that dynamically weights token and subsequence losses, yielding consistently better RL results on multiple LLMs and math benchmarks.

AESLDiversityLLM

0 likes · 11 min read

SFT Scores Don’t Predict RL Potential: Adaptive Early‑Stop Loss for LLMs

AI Engineer Programming

Mar 28, 2026 · Artificial Intelligence

How to Start Training Your Own AI Model: A Complete Roadmap

This guide maps the end-to-end process for building a small AI model—from leveraging open-source base models and applying SFT with LoRA/QLoRA, through alignment techniques like DPO or ORPO, to low-cost distillation and final quantization for local deployment, while recommending free GPU resources and essential tooling.

AIDistillationLoRA

0 likes · 12 min read

How to Start Training Your Own AI Model: A Complete Roadmap

PMTalk Product Manager Community

Jan 5, 2026 · Artificial Intelligence

Turning Base Models from Semi‑Finished to Killer AI Products: A PM’s Playbook

The article breaks down how AI product managers can transform a raw base model into a market‑ready, high‑impact product by applying supervised fine‑tuning, tool‑use routing, RLHF alignment, and chain‑of‑thought reasoning, while highlighting trade‑offs, cost shifts, and evaluation metrics.

Artificial IntelligenceProduct ManagementRLHF

0 likes · 13 min read

Turning Base Models from Semi‑Finished to Killer AI Products: A PM’s Playbook

Baobao Algorithm Notes

Dec 7, 2025 · Artificial Intelligence

Can RL Really Boost LLM Reasoning? A Critical Review of Recent Findings

This article critically examines recent RL‑for‑LLM studies, revealing that reinforcement learning improves search efficiency but does not extend the intrinsic reasoning capabilities of base models, and explores the underlying model‑conditioned optimization bias, comparisons with SFT distillation, and the trade‑off with catastrophic forgetting.

Catastrophic ForgettingLLMModel Optimization

0 likes · 11 min read

Can RL Really Boost LLM Reasoning? A Critical Review of Recent Findings

Amap Tech

Nov 19, 2025 · Artificial Intelligence

How Gaode’s Spacetime‑GR Model Boosts POI Recommendation with AI‑Powered SFT and DPO

Gaode transforms its map app into a dynamic, AI‑driven “living map” by fine‑tuning the large Spacetime‑GR model through embedding‑based and generative ranking SFT, DPO alignment, and multimodal augmentation, achieving significant offline CTR‑AUC improvements and online CTR gains in POI recommendation.

AI recommendationDPOMultimodal

0 likes · 12 min read

How Gaode’s Spacetime‑GR Model Boosts POI Recommendation with AI‑Powered SFT and DPO

Baobao Algorithm Notes

Oct 30, 2025 · Artificial Intelligence

Why LLM RL Training Crashes While SFT Stays Stable: Insights & Tricks

The article examines the fundamental similarity between SFT and RL loss functions for large language models, explains why RL training is prone to instability, discusses infrastructure and data quality challenges, and reviews practical tricks and reward‑model considerations for more reliable RL fine‑tuning.

AILLMReward Modeling

0 likes · 11 min read

Why LLM RL Training Crashes While SFT Stays Stable: Insights & Tricks

Bilibili Tech

Oct 17, 2025 · Artificial Intelligence

How Bilibili’s Multimodal Team Won 2nd Place at ICCV MIPI with a Novel SFT+GRPO Strategy

This article details how Bilibili’s multimedia lab leveraged a multimodal training pipeline combining data‑compressed SFT and the GRPO reinforcement‑learning algorithm to achieve a 13.5% metric boost and secure second place in the ICCV MIPI Detailed Image Quality Assessment competition.

GRPOMIPI competitionSFT

0 likes · 15 min read

How Bilibili’s Multimodal Team Won 2nd Place at ICCV MIPI with a Novel SFT+GRPO Strategy

Data Party THU

Sep 15, 2025 · Artificial Intelligence

Why Merge SFT and RL? Exploring Unified Fine‑Tuning Strategies for LLMs

This article examines the necessity of integrating Supervised Fine‑Tuning (SFT) with Reinforcement Learning (RL) for large language models, surveys alternating, sample‑reuse, simultaneous, and hint‑guided fusion methods, presents the underlying loss functions, and discusses practical trade‑offs such as entropy collapse and importance‑sampling corrections.

AILLMRL

0 likes · 14 min read

Why Merge SFT and RL? Exploring Unified Fine‑Tuning Strategies for LLMs

xkx's Tech General Store

Sep 10, 2025 · Artificial Intelligence

Exploring WebDancer: Alibaba’s WebAgent that Solves Complex Queries Automatically

This article walks through installing Alibaba's WebDancer agent, explains its SFT‑plus‑RL training pipeline—including data construction, trajectory sampling, supervised fine‑tuning, and reinforcement learning—compares it with the earlier WebWalker, and demonstrates its multi‑step reasoning on a real‑world query.

AI AgentAlibabaLLM Agents

0 likes · 10 min read

Exploring WebDancer: Alibaba’s WebAgent that Solves Complex Queries Automatically

Alibaba Cloud Big Data AI Platform

Jul 23, 2025 · Artificial Intelligence

How to Distill Large Language Models for Efficient Text Generation with EasyDistill

This guide explains how to use the EasyDistill framework and Alibaba Cloud PAI to distill large language models for high‑quality text generation, covering model deployment, SFT and DPO training data construction, code examples, configuration files, and best practices for achieving resource‑efficient, high‑performance student models.

DPOEasyDistillLarge Language Models

0 likes · 14 min read

How to Distill Large Language Models for Efficient Text Generation with EasyDistill

Alibaba Cloud Big Data AI Platform

Jul 16, 2025 · Artificial Intelligence

Master Post-Training: Fine-Tune LLMs with SFT, DPO, and GRPO on Alibaba PAI

This article explains post‑training concepts, compares SFT, DPO, and GRPO fine‑tuning methods, and provides step‑by‑step guidance for using Alibaba Cloud's PAI platform—including Model Gallery and DSW—to fine‑tune large language models with code examples and practical tips.

DPOGRPOLLM

0 likes · 14 min read

Master Post-Training: Fine-Tune LLMs with SFT, DPO, and GRPO on Alibaba PAI

DataFunTalk

Jun 29, 2025 · Artificial Intelligence

Large Models Boost Douyin User Experience: Expert Insights

In an interview at the DA Digital Intelligence Conference, ByteDance AI specialist Cai Conghuai explains how large language models, combined with techniques like SFT, DPO, and RAG, are reshaping Douyin's user‑experience signal detection, root‑cause analysis, and evaluation, while outlining future AI‑agent breakthroughs.

AIDPOLarge Language Models

0 likes · 12 min read

Large Models Boost Douyin User Experience: Expert Insights

DataFunSummit

Jun 22, 2025 · Artificial Intelligence

How Vivo’s BlueHeart AI Assistant Optimizes Post‑Conversation Recommendations with LLMs

In a detailed interview, Vivo AI engineer Liang Tianan explains how the BlueHeart Small V assistant leverages large language models, multi‑stage recall, ranking, and reward‑model fine‑tuning (SFT/DPO) to generate high‑quality, diverse post‑dialogue recommendation items while balancing latency, cost, and evaluation challenges.

DPOLLMSFT

0 likes · 15 min read

How Vivo’s BlueHeart AI Assistant Optimizes Post‑Conversation Recommendations with LLMs

Baobao Algorithm Notes

Mar 27, 2025 · Artificial Intelligence

Why a Robust Training Pipeline Beats Fancy LLM Tricks – Lessons from DAPO

The article analyzes the DAPO technical report, showing how dynamic‑sampling pipelines and token‑level loss handling in SFT and RL training outperform ad‑hoc algorithm tricks, and compares the training dynamics of reinforce_baseline and GRPO with concrete code examples.

Dynamic SamplingGRPOLLM

0 likes · 8 min read

Why a Robust Training Pipeline Beats Fancy LLM Tricks – Lessons from DAPO

Bilibili Tech

Jan 14, 2025 · Artificial Intelligence

Technical Practices and Productization of Intelligent Advertising Title Generation for Bilibili

We built an LLM‑powered system for Bilibili that automatically creates ad titles from user keywords, employing fluency, style, and quality classifiers, mixed domain data cleaning, and alignment methods such as SFT, DPO and KTO, resulting in a product that now generates about ten percent of daily titles and drives significant ad spend.

AI alignmentAd Title GenerationBilibili

0 likes · 24 min read

Technical Practices and Productization of Intelligent Advertising Title Generation for Bilibili

Baobao Algorithm Notes

Nov 18, 2024 · Artificial Intelligence

Boosting Vision‑Language Model Performance: Prompt‑First vs. Fine‑Tuning Strategies

This guide explains when to rely on prompt engineering versus SFT fine‑tuning for Vision‑Language Models, emphasizing data quality, appropriate dataset sizes, training epochs, hyper‑parameter tuning, and practical steps to build robust VLM pipelines.

AIDPOData Quality

0 likes · 10 min read

Boosting Vision‑Language Model Performance: Prompt‑First vs. Fine‑Tuning Strategies

Baobao Algorithm Notes

Nov 13, 2024 · Artificial Intelligence

Why Cleaning SFT Data Is a Nightmare: Hidden JSON Formatting Pitfalls

Cleaning SFT data for LLMs is surprisingly complex, as subtle JSON formatting variations, inconsistent markdown wrappers, intent settings, and unit handling can cause model inconsistencies, requiring unified standards, careful prompt design, and extensive manual review to ensure reliable training outputs.

JSON FormattingLLM data cleaningModel Training

0 likes · 8 min read

Why Cleaning SFT Data Is a Nightmare: Hidden JSON Formatting Pitfalls

Baobao Algorithm Notes

Oct 7, 2024 · Artificial Intelligence

Mastering LLM Supervised Fine‑Tuning: Practical Tips, Data Strategies, and Debugging

This article provides a comprehensive, experience‑driven guide to supervised fine‑tuning (SFT) of large language models, covering special tokens, latency considerations, data diversity and production, training frameworks and hyper‑parameters, over‑/under‑fitting diagnostics, and evaluation metrics such as helpfulness, honesty, and harmlessness.

AIData EngineeringLLM

0 likes · 40 min read

Mastering LLM Supervised Fine‑Tuning: Practical Tips, Data Strategies, and Debugging

NewBeeNLP

Sep 5, 2024 · Artificial Intelligence

Why RLHF Is Irreplaceable: Uncovering the Limits of SFT

The article analyzes why supervised fine‑tuning (SFT) cannot replace reinforcement learning from human feedback (RLHF), highlighting SFT's lack of negative feedback and backward‑looking capability, and explains how RLHF’s reward model addresses these fundamental shortcomings.

Language ModelsRLHFReward Modeling

0 likes · 7 min read

Why RLHF Is Irreplaceable: Uncovering the Limits of SFT

NewBeeNLP

Sep 3, 2024 · Industry Insights

Why Pre‑training Teams Boost New Engineers’ Skills Faster Than SFT Teams

The answer explains that joining a pre‑training team accelerates a newcomer’s engineering abilities through hands‑on work with large‑scale data pipelines, distributed training code, and debugging, while SFT teams focus mainly on data labeling, making pre‑training the more effective path for rapid skill growth.

AICareer AdviceEngineering Skills

0 likes · 6 min read

Why Pre‑training Teams Boost New Engineers’ Skills Faster Than SFT Teams

Baobao Algorithm Notes

Aug 29, 2024 · Artificial Intelligence

Why RLHF Is Essential: The Limits of SFT and the Power of Reward Modeling

The article analyzes why Reinforcement Learning from Human Feedback (RLHF) cannot be replaced by Supervised Fine‑Tuning (SFT), highlighting SFT's lack of negative feedback, its one‑directional attention limitation, and how RLHF's reward models provide crucial safety and performance improvements for large language models.

AI alignmentLarge Language ModelsRLHF

0 likes · 9 min read

Why RLHF Is Essential: The Limits of SFT and the Power of Reward Modeling

Baobao Algorithm Notes

Aug 29, 2024 · Industry Insights

Why Pretraining Boosts New Engineers More Than SFT: A Practical Guide

The answer argues that fresh graduates should join pre‑training teams because the required engineering tasks—large‑scale data crawling, Hadoop/Spark pipelines, torch and CUDA setup, megatron code debugging, and scaling‑law experiments—rapidly sharpen coding skills, while SFT work focuses mainly on data labeling and offers slower technical growth.

AI EngineeringCareer AdviceSFT

0 likes · 7 min read

Why Pretraining Boosts New Engineers More Than SFT: A Practical Guide

Baobao Algorithm Notes

Aug 8, 2024 · Artificial Intelligence

Turning LLM Fine‑Tuning into a Skill‑Building Journey: Practical Strategies

The article breaks down multiple practical approaches for data preparation, training code handling, and experiment analysis in large‑language‑model fine‑tuning, showing how deeper engagement in each step can boost personal expertise even when final model performance appears similar.

Artificial IntelligenceLLMSFT

0 likes · 9 min read

Turning LLM Fine‑Tuning into a Skill‑Building Journey: Practical Strategies

Data Thinking Notes

Aug 1, 2024 · Artificial Intelligence

Unlocking Vertical Domain LLMs: Advantages, Challenges, and Alignment Strategies

Over the past year our team explored applying large language models to specialized domains, detailing their professional benefits, unique challenges such as accuracy and knowledge‑base maintenance, and presenting solutions like alignment enhancement via BPO, Text2API, RAG, and advanced SFT/DPO techniques.

Large Language ModelsRAGSFT

0 likes · 10 min read

Unlocking Vertical Domain LLMs: Advantages, Challenges, and Alignment Strategies

Alibaba Cloud Developer

Jul 22, 2024 · Artificial Intelligence

How Alibaba’s Logistics AI Overcame B2B Large Model Challenges

Alibaba’s logistics AI team shares their year‑long journey building a vertical‑domain large language model for logistics, detailing model alignment, Text2API, RAG, SFT techniques, challenges like accuracy and knowledge‑base maintenance, and showcasing real‑world applications such as chatbots, DingTalk assistants, and custom AI assistants.

RAGSFTText2API

0 likes · 16 min read

How Alibaba’s Logistics AI Overcame B2B Large Model Challenges

DaTaobao Tech

Jul 19, 2024 · Artificial Intelligence

Practices and Techniques for Vertical Domain Large Language Models

Vertical domain large language models, fine‑tuned on specialized data, deliver higher expertise and task performance, but require continual knowledge updates and careful alignment; techniques such as BPO‑guided instruction tuning (+1.8% accuracy), Reflexion‑based Text2API (+4% API correctness), advanced RAG preprocessing, and SFT combined with ORPO (+5.2% gain) demonstrate notable improvements while underscoring remaining challenges and collaborative opportunities.

AIRAGSFT

0 likes · 9 min read

Practices and Techniques for Vertical Domain Large Language Models

Baidu App Technology

May 8, 2024 · Artificial Intelligence

How AI Can Auto‑Generate Standardized Git Commit Messages

This article details the design, implementation, and evaluation of an AI‑powered tool that automatically creates compliant Git commit messages by leveraging large language models, custom plugins, and performance‑focused optimizations to improve developer productivity and commit quality.

AIGitLLM

0 likes · 16 min read

How AI Can Auto‑Generate Standardized Git Commit Messages

DataFunTalk

Apr 29, 2024 · Artificial Intelligence

Practical Experience and Q&A Exploration of Patent Large Models

This article presents a comprehensive overview of the development, training, data preparation, algorithmic strategies, evaluation methods, and RAG integration for a domain‑specific patent large language model, highlighting challenges, practical results, and future research directions.

Domain-specific ModelPatent AIRAG

0 likes · 19 min read

Practical Experience and Q&A Exploration of Patent Large Models

JD Retail Technology

Mar 4, 2024 · Artificial Intelligence

How JD Retail Integrates LLMs with SFT, RAG, and AI Agents for Real-World Impact

This article examines JD Retail's end‑to‑end large language model framework that combines supervised fine‑tuning, retrieval‑augmented generation, and ReAct‑based AI agents to overcome retail‑specific challenges, improve model accuracy, reduce hallucinations, and enable autonomous multi‑step business workflows.

AI AgentArtificial IntelligenceLLM

0 likes · 20 min read

How JD Retail Integrates LLMs with SFT, RAG, and AI Agents for Real-World Impact

NewBeeNLP

Feb 22, 2024 · Artificial Intelligence

Practical Tips for CPT, SFT, and LoRA in Large Language Model Fine‑Tuning

This article shares hands‑on guidance on using continual pre‑training (CPT), supervised fine‑tuning (SFT), and LoRA adapters for large language models, covering dataset size requirements, learning‑rate scheduling, warm‑up ratios, epoch strategies, and practical routing choices based on real‑world experiments.

CPTLLM fine-tuningLoRA

0 likes · 12 min read

Practical Tips for CPT, SFT, and LoRA in Large Language Model Fine‑Tuning

Baobao Algorithm Notes

Oct 25, 2023 · Artificial Intelligence

How Mixed Data Shapes LLaMA SFT: Scaling Trends, Conflict Zones, and the DMT Remedy

This article investigates how mixing data from mathematical reasoning, code generation, and general instruction-following tasks influences supervised fine‑tuning of LLaMA models, revealing distinct scaling curves, resource‑dependent performance conflicts, and a two‑stage DMT strategy that mitigates catastrophic forgetting while boosting overall capability.

DMT StrategyData ScalingLLaMA

0 likes · 14 min read

How Mixed Data Shapes LLaMA SFT: Scaling Trends, Conflict Zones, and the DMT Remedy

Baobao Algorithm Notes

Aug 18, 2023 · Artificial Intelligence

Unlocking Domain-Specific Large Model Training: Proven Tricks and Pitfalls

This article shares practical techniques for domain‑specific large model continue pre‑training, including data selection, mixing ratios with general data, multi‑task instruction pre‑training, resource‑aware fine‑tuning strategies, evaluation set design, vocabulary considerations, and deployment constraints for 7‑13B models.

AI researchSFTmodel evaluation

0 likes · 9 min read

Unlocking Domain-Specific Large Model Training: Proven Tricks and Pitfalls

Baobao Algorithm Notes

Jul 19, 2023 · Artificial Intelligence

Llama 2’s Breakthroughs: Architecture, Data, and Training Tricks Explained

Llama 2 advances open‑source large‑model research by expanding context length to 4096, adopting GQA attention, scaling training data to 2 trillion tokens, and introducing refined SFT and RLHF techniques such as Ghost Attention, margin‑based reward modeling, and iterative rejection sampling, all detailed in Meta’s 76‑page report.

Llama 2Open-source AIRLHF

0 likes · 8 min read

Llama 2’s Breakthroughs: Architecture, Data, and Training Tricks Explained