Tagged articles

LLM fine-tuning

25 articles · Page 1 of 1

Jun 30, 2026 · Artificial Intelligence

How to Fine‑Tune LLMs in 2026: Overcome the 30‑40% Error Wall with GRPO and RULER

Teams building LLM‑powered products often hit a wall where 30‑40% of responses are wrong and the model never learns from mistakes; the article explains how modern fine‑tuning using GRPO‑based reinforcement learning and the open‑source ART framework, together with the RULER reward‑free evaluator, lets small open‑source models surpass larger ones in cost, latency, and accuracy.

ART frameworkGRPOLLM fine-tuning

0 likes · 9 min read

How to Fine‑Tune LLMs in 2026: Overcome the 30‑40% Error Wall with GRPO and RULER

Wu Shixiong's Large Model Academy

May 13, 2026 · Artificial Intelligence

How to Explain a Jump from 71% to 94% Tool‑Calling Accuracy in a JD Interview

The article walks through a JD interview scenario where a candidate explains how a tool‑calling accuracy metric rose from 71% to 94% by detailing the full SFT data‑engineering pipeline, teacher‑model trajectory generation, quality validation, evaluation methodology, and interview‑ready talking points.

Data EngineeringEvaluationFunction Calling

0 likes · 19 min read

How to Explain a Jump from 71% to 94% Tool‑Calling Accuracy in a JD Interview

AI Cyberspace

Jan 29, 2026 · Artificial Intelligence

Step‑by‑Step Guide to Efficient LLM Fine‑Tuning with LoRA, QLoRA, and Llama‑Factory

This tutorial explains the concepts, methods, and practical commands for fine‑tuning large language models using efficient techniques like LoRA and QLoRA, covering model selection, resource considerations, Docker deployment, dataset preparation, training configuration, evaluation metrics, model merging, and deployment with GGUF and Ollama.

GGUFGPU memory optimizationLLM fine-tuning

0 likes · 27 min read

Step‑by‑Step Guide to Efficient LLM Fine‑Tuning with LoRA, QLoRA, and Llama‑Factory

AI Engineering

Jan 7, 2026 · Artificial Intelligence

Unsloth-MLX: Fine‑Tune LLMs on Mac and Seamlessly Move Code to Cloud GPUs

Unsloth‑MLX leverages Apple’s MLX framework to let Mac users with Apple Silicon fine‑tune large language models locally with a single import change, offering zero‑cost migration to cloud GPUs, supporting SFT, DPO, ORPO, GRPO training, and export to HuggingFace or GGUF formats.

Apple SiliconGPU cloudLLM fine-tuning

0 likes · 4 min read

Unsloth-MLX: Fine‑Tune LLMs on Mac and Seamlessly Move Code to Cloud GPUs

DataFunSummit

Nov 3, 2025 · Artificial Intelligence

Boosting Private Agentic AI: LLM Post‑Training, DPO, and End‑to‑End Evaluation

This article shares practical experience on deploying private Agentic AI, covering background, architecture design, challenges, data generation, reinforcement learning with DPO, automated multi‑dimensional evaluation, and future plans for open‑source models and richer tool integration.

DPOLLM fine-tuningagentic AI

0 likes · 16 min read

Boosting Private Agentic AI: LLM Post‑Training, DPO, and End‑to‑End Evaluation

Baobao Algorithm Notes

Oct 31, 2025 · Artificial Intelligence

How Risk‑Sensitive Reinforcement Learning Improves LLM Pass@K Performance

This article analyzes why standard reinforcement learning can degrade Pass@K metrics after fine‑tuning large language models, introduces a risk‑sensitive RL objective that reshapes the advantage estimator, and demonstrates through bandit and mathematical‑reasoning experiments that the RS‑GRPO method consistently boosts diversity and overall Pass@K scores across multiple LLMs.

Exploration-ExploitationLLM fine-tuningRS-GRPO

0 likes · 12 min read

How Risk‑Sensitive Reinforcement Learning Improves LLM Pass@K Performance

Code Mala Tang

Oct 9, 2025 · Artificial Intelligence

Fine‑Tune a Language Model for Band Trivia with Hugging Face PEFT

This tutorial walks through installing Python dependencies, preparing a JSON‑based QA dataset, and using Hugging Face's PEFT library to fine‑tune a small FLAN‑T5 model so it can answer questions about AC/DC and other bands without passing knowledge at inference time.

FAQ modelHugging FaceLLM fine-tuning

0 likes · 12 min read

Fine‑Tune a Language Model for Band Trivia with Hugging Face PEFT

Data Party THU

Sep 2, 2025 · Artificial Intelligence

Gradient-Based Multi-Objective Deep Learning: Theory, Algorithms, and LLM Applications

This tutorial provides a systematic overview of gradient‑based multi‑objective optimization for deep learning, covering core solution strategies, algorithmic details, convergence and generalization analyses, and demonstrates how these methods can be applied to fine‑tune and align large language models.

Deep LearningGradient MethodsLLM fine-tuning

0 likes · 3 min read

Gradient-Based Multi-Objective Deep Learning: Theory, Algorithms, and LLM Applications

21CTO

Aug 30, 2025 · Artificial Intelligence

10 Must‑Use Open‑Source AI Tools Every Developer Should Try

This article presents a curated list of ten open‑source AI tools—from instant prototyping agents and reactive notebooks to fast LLM fine‑tuning, ethical hacking assistants, local ChatGPT interfaces, and database‑integrated machine learning—explaining their key features, benefits, and why developers should adopt them to boost productivity and maintain privacy.

AI coding assistantLLM fine-tuningOpen-source AI

0 likes · 19 min read

10 Must‑Use Open‑Source AI Tools Every Developer Should Try

AI Algorithm Path

Jul 19, 2025 · Artificial Intelligence

Understanding LoRA and QLoRA: Techniques for Efficient LLM Fine‑Tuning

This article explains how low‑rank adaptation (LoRA) and its quantized variant (QLoRA) compress large language model weights, reduce training cost, and enable flexible adapter switching, while detailing matrix decomposition, training mechanics, and trade‑offs with concrete examples and quantitative analysis.

AdapterLLM fine-tuningLoRA

0 likes · 11 min read

Understanding LoRA and QLoRA: Techniques for Efficient LLM Fine‑Tuning

Amap Tech

May 19, 2025 · Artificial Intelligence

Group Policy Gradient: Direct Objective Optimization for Faster Reinforcement Learning

The article introduces Group Policy Gradient (GPG), a reinforcement‑learning framework that eliminates surrogate loss functions and critic models, directly optimizes the original objective, reduces bias and variance, and achieves state‑of‑the‑art performance on both single‑modal and multimodal tasks.

AI researchLLM fine-tuningbias reduction

0 likes · 7 min read

Group Policy Gradient: Direct Objective Optimization for Faster Reinforcement Learning

Alibaba Cloud Native

Mar 23, 2025 · Cloud Native

Fine‑Tune Large Language Models on Kubernetes with Argo Workflows

This article explains the challenges of fine‑tuning large language models, why Argo Workflows is an ideal Kubernetes‑native solution, and provides a step‑by‑step example using DeepSeek, covering data preparation, model selection, training, evaluation, and the benefits of automation and scalability.

AI pipelinesArgo WorkflowsLLM fine-tuning

0 likes · 8 min read

Fine‑Tune Large Language Models on Kubernetes with Argo Workflows

Sohu Tech Products

Mar 19, 2025 · Artificial Intelligence

Easy DataSet: An Open‑Source Tool for Building Domain‑Specific Datasets and Fine‑Tuning Large Language Models

The article introduces Easy DataSet, an open‑source tool that streamlines the creation of domain‑specific datasets by aggregating public data sources, chunking Markdown documents, generating and managing QA pairs with configurable LLM endpoints, and exporting them in common formats, while outlining its architecture and future roadmap.

AIData ManagementLLM fine-tuning

0 likes · 30 min read

Easy DataSet: An Open‑Source Tool for Building Domain‑Specific Datasets and Fine‑Tuning Large Language Models

Ops Development & AI Practice

Mar 19, 2025 · Artificial Intelligence

How to Fine‑Tune Large Language Models: From PEFT to Knowledge Injection

This article provides a comprehensive guide to customizing pre‑trained large language models through fine‑tuning techniques—including parameter‑efficient methods, data preparation, knowledge injection, and robust evaluation—offering practical steps, best practices, and domain‑specific considerations for achieving superior task performance.

LLM fine-tuningdata preparationknowledge injection

0 likes · 18 min read

How to Fine‑Tune Large Language Models: From PEFT to Knowledge Injection

Architect

Mar 9, 2025 · Artificial Intelligence

Experiments with Reinforcement Learning Fine‑Tuning of a 0.5B Qwen Model on the KK Dataset

The author reports a series of reinforcement‑learning‑based fine‑tuning experiments on a 0.5‑billion‑parameter Qwen‑0.5VB instruct model using the KK dataset, detailing reward design adjustments, curriculum‑style data scaling, observed convergence issues, and hypotheses about why small models fail to develop long reasoning chains.

LLM fine-tuningcurriculum-learningreinforcement learning

0 likes · 11 min read

Experiments with Reinforcement Learning Fine‑Tuning of a 0.5B Qwen Model on the KK Dataset

Cognitive Technology Team

Feb 24, 2025 · Artificial Intelligence

Fine-Tuning Large Language Models with LoRA: A Step-by-Step Guide and Code Example

This article demonstrates the before-and-after effects of fine‑tuning a large language model, explains the concept with analogies, details hardware setup, dataset preparation, LoRA configuration, training arguments, and provides complete Python code for a pure‑framework fine‑tuning workflow.

HuggingFaceLLM fine-tuningLoRA

0 likes · 24 min read

Fine-Tuning Large Language Models with LoRA: A Step-by-Step Guide and Code Example

DevOps

May 29, 2024 · Artificial Intelligence

End-to-End Task-Oriented Dialogue Agent Construction Using Monte Carlo Simulation and LLM Fine-Tuning

This article presents an end‑to‑end approach for building task‑oriented dialogue agents by simulating user behavior with Monte Carlo methods, generating training data via LLMs, and efficiently fine‑tuning multiple large language models using LLaMA Factory, demonstrating significant improvements in intent recognition, slot filling, and contextual understanding.

Data GenerationLLM fine-tuningMonte Carlo simulation

0 likes · 17 min read

End-to-End Task-Oriented Dialogue Agent Construction Using Monte Carlo Simulation and LLM Fine-Tuning

360 Smart Cloud

Apr 15, 2024 · Artificial Intelligence

Fine‑Tuning Qwen‑14B Large Language Model: A Complete Guide

This article provides a comprehensive tutorial on fine‑tuning the Qwen‑14B large language model, covering the motivation, fine‑tuning concepts, step‑by‑step workflow, required code, DeepSpeed training parameters, testing scripts, and deployment using FastChat and the 360AI platform.

AI model deploymentDeepSpeedFastChat

0 likes · 9 min read

Fine‑Tuning Qwen‑14B Large Language Model: A Complete Guide

NewBeeNLP

Feb 22, 2024 · Artificial Intelligence

Practical Tips for CPT, SFT, and LoRA in Large Language Model Fine‑Tuning

This article shares hands‑on guidance on using continual pre‑training (CPT), supervised fine‑tuning (SFT), and LoRA adapters for large language models, covering dataset size requirements, learning‑rate scheduling, warm‑up ratios, epoch strategies, and practical routing choices based on real‑world experiments.

CPTLLM fine-tuningLoRA

0 likes · 12 min read

Practical Tips for CPT, SFT, and LoRA in Large Language Model Fine‑Tuning

NewBeeNLP

Feb 5, 2024 · Artificial Intelligence

How HiFT Slashes GPU Memory for LLM Fine‑Tuning with Hierarchical Optimization

HiFT introduces a layer‑wise hierarchical fine‑tuning strategy that freezes most parameters per step, reduces optimizer state memory, and adapts mixed‑precision training, enabling 7B and 13B models to be fine‑tuned on 16‑31 GB GPUs while maintaining competitive performance.

GPU memoryHiFTLLM fine-tuning

0 likes · 12 min read

How HiFT Slashes GPU Memory for LLM Fine‑Tuning with Hierarchical Optimization

DataFunTalk

Dec 29, 2023 · Artificial Intelligence

Enterprise Knowledge Assistant: Leveraging Vector Databases and Large Language Models

This article explores the emerging enterprise knowledge assistant paradigm in the era of large models, detailing traditional knowledge management challenges, solution architecture using vector databases and LLMs, core technologies such as ETL pipelines, reranking, secure fine‑tuning, and future prospects for intelligent enterprise applications.

Enterprise AIKnowledge ManagementLLM fine-tuning

0 likes · 11 min read

Enterprise Knowledge Assistant: Leveraging Vector Databases and Large Language Models

Baidu Geek Talk

Dec 19, 2023 · Industry Insights

Inside Baidu Search Innovation Contest: Winning AI Solutions Across Five Tracks

The second Baidu Search Innovation Contest attracted over 2,800 participants from 45 regions, featured five AI‑focused tracks, and highlighted champion teams that employed techniques such as Lora‑fine‑tuned LLMs, vector‑intersection Top‑K search, GPU‑optimized algorithms, and diffusion‑based image generation to push the boundaries of search technology.

AI competitionDiffusion ModelsGPU Optimization

0 likes · 12 min read

Inside Baidu Search Innovation Contest: Winning AI Solutions Across Five Tracks

DaTaobao Tech

Oct 25, 2023 · Artificial Intelligence

Prompt Engineering, LLM Supervised Fine‑Tuning, and Mobile Tmall AI Assistant Application

The article explains prompt engineering techniques, supervised fine‑tuning of large language models, and their practical deployment in the Mobile Tmall AI shopping assistant, detailing ChatGPT’s generation steps, Transformer architecture, prompt clarity, delimiters, role‑play, few‑shot and chain‑of‑thought prompting, SFT versus pre‑training, LoRA adapters, data collection, Qwen‑14B training configuration, SDK‑based inference, and comprehensive evaluation.

AI assistantLLM fine-tuningModel Deployment

0 likes · 14 min read

Prompt Engineering, LLM Supervised Fine‑Tuning, and Mobile Tmall AI Assistant Application

Baobao Algorithm Notes

Jul 5, 2023 · Artificial Intelligence

Session‑Level Sample Organization for Decoder‑Only LLM Fine‑Tuning

This article explains how to restructure multi‑turn dialogue data into single session‑level training samples for decoder‑only large language models, leveraging causal attention and simple position IDs, and provides a concrete implementation, performance results, and a gradient‑weight analysis.

ChatGLM2LLM fine-tuningPrompt engineering

0 likes · 7 min read

Session‑Level Sample Organization for Decoder‑Only LLM Fine‑Tuning

phodal

Apr 12, 2023 · Artificial Intelligence

Four AI‑Driven Code Generation Techniques: From Example‑Based to Metadata‑Assisted

This article explores four distinct fine‑tuned LLaMA/ChatGLM approaches for AI‑assisted code generation—example‑based, test‑driven, metadata‑augmented, and information‑matching—detailing their training data, prompts, sample inputs and outputs, and evaluating their strengths, limitations, and suitable application scenarios.

AI code generationLLM fine-tuningcode synthesis

0 likes · 11 min read

Four AI‑Driven Code Generation Techniques: From Example‑Based to Metadata‑Assisted