Tag

Supervised Fine-tuning

0 views collected around this technical thread.

DataFunTalk
DataFunTalk
Mar 9, 2025 · Artificial Intelligence

Critique Fine-Tuning (CFT): Boosting Large Language Model Reasoning with Minimal Data

The paper introduces Critique Fine-Tuning (CFT), a method that replaces simple imitation in supervised fine‑tuning with critique‑based learning, achieving superior reasoning performance on mathematical benchmarks using only 50 K samples, outperforming traditional reinforcement‑learning approaches that require millions of examples.

AI reasoningCritique Fine-TuningLarge Language Models
0 likes · 7 min read
Critique Fine-Tuning (CFT): Boosting Large Language Model Reasoning with Minimal Data
DataFunTalk
DataFunTalk
Feb 16, 2025 · Artificial Intelligence

Understanding Reasoning LLMs: DeepSeek R1 Variants, Inference‑Time Scaling, and Training Strategies

This article explains what reasoning language models are, outlines their strengths and weaknesses, details DeepSeek R1's three variants and their training pipelines—including pure reinforcement learning, SFT + RL, and distillation—while also discussing inference‑time scaling techniques and related research such as Sky‑T1 and TinyZero.

DeepSeekModel DistillationSupervised Fine-tuning
0 likes · 16 min read
Understanding Reasoning LLMs: DeepSeek R1 Variants, Inference‑Time Scaling, and Training Strategies
Architecture Digest
Architecture Digest
Feb 7, 2025 · Artificial Intelligence

Open-Source Replication of OpenAI’s o1 Model Achieves Superior Performance with Minimal Cost

A recent study by Fei‑Fei Li’s team shows that using supervised fine‑tuning on the open‑source Qwen2.5‑32B‑Instruct model can replicate and even surpass the reasoning abilities of OpenAI’s o1‑preview at a fraction of the computational cost, demonstrating a cheap yet powerful approach to large‑language‑model development.

Large Language ModelsModel DistillationOpen-source
0 likes · 6 min read
Open-Source Replication of OpenAI’s o1 Model Achieves Superior Performance with Minimal Cost
Bilibili Tech
Bilibili Tech
Nov 5, 2024 · Artificial Intelligence

Bilibili's In-House Role-Playing Large Language Model: Architecture, Training Stages, Evaluation, and Demonstrations

Bilibili’s in‑house role‑playing large language model, built on the Index architecture and refined through pre‑training, supervised fine‑tuning, and preference optimization (PPO and DPO), achieved top scores on the Chinese CharacterEval benchmark, surpassing rivals while incorporating safety alignment and showcasing consistent, personality‑driven dialogue examples.

Evaluation BenchmarkPretrainingSupervised Fine-tuning
0 likes · 13 min read
Bilibili's In-House Role-Playing Large Language Model: Architecture, Training Stages, Evaluation, and Demonstrations
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Dec 24, 2023 · Artificial Intelligence

Llama 2: Open Foundation and Fine‑Tuned Chat Models – Overview, Training, and RLHF Details

This article provides a comprehensive English overview of Meta's Llama 2 family, describing the model sizes, pre‑training data, architectural improvements, supervised fine‑tuning, reinforcement learning with human feedback, safety evaluations, reward‑model training, and iterative optimization techniques used to produce the high‑performing Llama 2‑Chat models.

AI safetyLlama 2Open-source
0 likes · 33 min read
Llama 2: Open Foundation and Fine‑Tuned Chat Models – Overview, Training, and RLHF Details
DataFunSummit
DataFunSummit
Feb 10, 2023 · Artificial Intelligence

Why ChatGPT Shows Strong General Intelligence: Insights from Andrew Ng’s DeepLearning.AI Article

The article explains how techniques such as Reinforcement Learning from Human Feedback, Instruction Fine‑Tuning, Supervised Fine‑tuning and Chain‑of‑Thought contribute to ChatGPT’s impressive general‑intelligence performance, as analyzed by DeepLearning.AI founder Andrew Ng.

Artificial IntelligenceChain-of-ThoughtChatGPT
0 likes · 2 min read
Why ChatGPT Shows Strong General Intelligence: Insights from Andrew Ng’s DeepLearning.AI Article