Tagged articles

Qwen2.5

10 articles · Page 1 of 1

Apr 27, 2026 · Artificial Intelligence

ACL 2026: Unveiling a Predictive Scaling Law for Reinforcement Learning Fine‑Tuning of Large Models

The paper presents a systematic empirical study that derives a power‑law scaling formula for reinforcement‑learning‑after‑training of large language models, demonstrating accurate inter‑ and intra‑model performance prediction, learning‑efficiency saturation, data‑reuse benefits, and cross‑architecture validity.

Data ReuseLarge Language ModelsLlama 3

0 likes · 11 min read

ACL 2026: Unveiling a Predictive Scaling Law for Reinforcement Learning Fine‑Tuning of Large Models

PaperAgent

Feb 7, 2026 · Artificial Intelligence

Can 13 Parameters Match Full‑Scale Fine‑Tuning? TinyLoRA’s RL Breakthrough

TinyLoRA, a Meta‑proposed method that fine‑tunes Qwen2.5‑7B with only 13 trainable parameters (26 bytes), achieves 91% accuracy on GSM8K under reinforcement learning, revealing that ultra‑low‑parameter RL can rival full‑scale supervised fine‑tuning.

GSM8KQwen2.5TinyLoRA

0 likes · 7 min read

Can 13 Parameters Match Full‑Scale Fine‑Tuning? TinyLoRA’s RL Breakthrough

Ubuntu

Jan 24, 2026 · Artificial Intelligence

Unlock Full‑Stack AI Coding on Ubuntu with Ollama and CC Switch

This step‑by‑step guide shows how to replace cloud‑based AI coding tools with a private, zero‑cost workflow on Ubuntu by installing Ollama, configuring systemd, adding DeepSeek or Qwen2.5 models, installing Claude, Codex and Gemini CLIs, and routing them through CC Switch.

AI codingCC SwitchClaude Code

0 likes · 7 min read

Unlock Full‑Stack AI Coding on Ubuntu with Ollama and CC Switch

Fun with Large Models

Jun 12, 2025 · Artificial Intelligence

Implement GRPO to Give LLMs Reasoning Ability with Qwen2.5‑0.5B

This article explains the GRPO reinforcement‑learning algorithm, shows its core idea of internal group competition without a separate evaluator model, and provides a complete, step‑by‑step code walkthrough—including environment setup, dataset preparation, reward‑function design, training configuration, and evaluation—using the Qwen2.5‑0.5B‑Instruct model on the GSM8K math dataset.

GRPOGSM8KQwen2.5

0 likes · 23 min read

Implement GRPO to Give LLMs Reasoning Ability with Qwen2.5‑0.5B

Alibaba Cloud Big Data AI Platform

Mar 12, 2025 · Artificial Intelligence

Deploy, Fine‑Tune, and Compress DistilQwen2.5 on Alibaba Cloud PAI – A Complete Guide

This article walks through the full workflow for using Alibaba Cloud's open‑source DistilQwen2.5 models on the PAI platform, covering environment setup, model deployment, fine‑tuning with SFT and DPO, evaluation, and model compression for resource‑constrained scenarios.

DistilQwen2.5Large Language ModelPAI

0 likes · 13 min read

Deploy, Fine‑Tune, and Compress DistilQwen2.5 on Alibaba Cloud PAI – A Complete Guide

Big Data Technology Architecture

Feb 9, 2025 · Artificial Intelligence

Reproducing Deepseek RI Reasoning Ability with GRPO on Qwen2.5‑7B in Colab

This article explains how to replicate Deepseek RI's slow‑thinking inference using the GRPO reinforcement‑learning algorithm on the Qwen2.5‑7B model in a free Colab notebook, covering the underlying COT concept, reward‑function design, data preparation, training configuration, and observed results.

GRPOLLMPython

0 likes · 14 min read

Reproducing Deepseek RI Reasoning Ability with GRPO on Qwen2.5‑7B in Colab

Software Engineering 3.0 Era

Feb 6, 2025 · Artificial Intelligence

Training an Inference Model Rivaling OpenAI o1 and DeepSeek R1 for Under $50 in 26 Minutes

Researchers from Stanford and Washington trained the s1 inference model in just 26 minutes using under $50 of cloud credits, achieving performance comparable to OpenAI's o1 and DeepSeek's R1 by building a curated 1,000‑sample dataset and a budget‑enforced test‑time scaling algorithm.

AI benchmarkingQwen2.5budget enforcement

0 likes · 7 min read

Training an Inference Model Rivaling OpenAI o1 and DeepSeek R1 for Under $50 in 26 Minutes

Baobao Algorithm Notes

Jan 8, 2025 · Artificial Intelligence

Inside Llama 3.1, DeepSeek‑V3, TÜLU 3 & Qwen 2.5: A Deep Dive into Post‑Training Techniques

This article compiles and analyzes the post‑training pipelines of Llama 3.1, DeepSeek‑V3, TÜLU 3 and Qwen 2.5, detailing their data compositions, SFT, reward modeling, DPO, GRPO, RLVR methods, hyper‑parameters, and practical tricks for large‑language‑model alignment.

DPODeepSeek-V3Llama3.1

0 likes · 22 min read

Inside Llama 3.1, DeepSeek‑V3, TÜLU 3 & Qwen 2.5: A Deep Dive into Post‑Training Techniques

Alibaba Cloud Native

Dec 26, 2024 · Cloud Computing

Deploy Qwen2.5 LLM on Alibaba Cloud Function Compute: A Step‑by‑Step Guide

This guide explains how to deploy the Qwen2.5 large language model on Alibaba Cloud Function Compute using Ollama and Open WebUI, covering model selection, resource configuration, deployment steps, interface setup, multilingual capabilities, and automatic scaling for high‑concurrency workloads.

AI model deploymentCloud ComputingFunction Compute

0 likes · 10 min read

Deploy Qwen2.5 LLM on Alibaba Cloud Function Compute: A Step‑by‑Step Guide

NewBeeNLP

Dec 23, 2024 · Artificial Intelligence

What’s New in Qwen2.5? A Deep Dive into the Latest LLM Advances

The Qwen2.5 Technical Report introduces a new series of large language models with up to 72 B parameters, expanded pre‑training data to 18 trillion tokens, advanced supervised fine‑tuning and reinforcement learning pipelines, and demonstrates strong performance across comprehension, reasoning, coding, and long‑context tasks.

LLMLarge Language ModelQwen2.5

0 likes · 5 min read

What’s New in Qwen2.5? A Deep Dive into the Latest LLM Advances