Tagged articles
12 articles
Page 1 of 1
Machine Heart
Machine Heart
Apr 15, 2026 · Artificial Intelligence

DataFlex: An Industrial‑Grade Dynamic Data Training System for Large Models

DataFlex, built on LLaMA‑Factory, offers a unified, reproducible infrastructure that dynamically selects, mixes, and re‑weights training data, turning data into a controllable optimization object and delivering measurable gains in training efficiency and model performance for large‑scale AI models.

DataFlexData‑Centric AIDynamic Data Training
0 likes · 14 min read
DataFlex: An Industrial‑Grade Dynamic Data Training System for Large Models
AIWalker
AIWalker
Mar 16, 2026 · Artificial Intelligence

DETR Drops Hungarian Matching: Double Training Speed, +4.2 AP on Large Objects

Beyond-Hungarian replaces the costly Hungarian assignment in DETR with a differentiable, query‑free matching scheme that halves training latency, boosts large‑object AP by 4.2 points, and introduces a GT‑Probe module and dual‑loss framework, while detailing trade‑offs, ablations, and future challenges.

DETRGT-ProbeHungarian matching
0 likes · 14 min read
DETR Drops Hungarian Matching: Double Training Speed, +4.2 AP on Large Objects
AI Explorer
AI Explorer
Mar 15, 2026 · Artificial Intelligence

Large Models May Break Language Training Dependence, Redefining Intelligence

A new study suggests that large AI models could reduce their reliance on massive text corpora by early‑fusing multimodal data such as video and sensor streams, potentially slashing training costs, improving generalization, and prompting a shift toward more embodied notions of intelligence.

AI researchEmbodied IntelligenceMultimodal Learning
0 likes · 6 min read
Large Models May Break Language Training Dependence, Redefining Intelligence
Baobao Algorithm Notes
Baobao Algorithm Notes
Dec 22, 2025 · Artificial Intelligence

Which Agentic RL Framework Wins? A Deep Dive into AReal, Seer, Slime & verl

This article analyzes the training‑efficiency challenges of multi‑turn agentic reinforcement learning and compares four recent open‑source frameworks—AReal (Ant), Seer (Moonshot), Slime (Zhipu) and verl (Bytedance)—examining their asynchronous inference designs, rollout‑train separation, long‑context handling, off‑policy mitigation, and system‑level optimizations to guide framework selection.

Asynchronous InferenceRL SystemsTraining efficiency
0 likes · 18 min read
Which Agentic RL Framework Wins? A Deep Dive into AReal, Seer, Slime & verl
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Jun 6, 2025 · Artificial Intelligence

How dots.llm1 Sets New Benchmarks for Open‑Source MoE Language Models

dots.llm1, an open‑source 142‑billion‑parameter Mixture‑of‑Experts language model from hi lab, achieves Qwen2.5‑72B‑level performance after training on 11.2 T high‑quality tokens, and the release includes full models, intermediate checkpoints, and detailed training pipelines for the research community.

AI researchMixture of ExpertsTraining efficiency
0 likes · 10 min read
How dots.llm1 Sets New Benchmarks for Open‑Source MoE Language Models
DataFunTalk
DataFunTalk
Mar 9, 2025 · Artificial Intelligence

Critique Fine-Tuning (CFT): Boosting Large Language Model Reasoning with Minimal Data

The paper introduces Critique Fine-Tuning (CFT), a method that replaces simple imitation in supervised fine‑tuning with critique‑based learning, achieving superior reasoning performance on mathematical benchmarks using only 50 K samples, outperforming traditional reinforcement‑learning approaches that require millions of examples.

AI reasoningCritique Fine-TuningMathematical Benchmarks
0 likes · 7 min read
Critique Fine-Tuning (CFT): Boosting Large Language Model Reasoning with Minimal Data
ZhongAn Tech Team
ZhongAn Tech Team
Feb 16, 2025 · Artificial Intelligence

DeepSeek R1 and V3: Model Innovations, Industry Impact, and Future Trends

The article reviews DeepSeek's open‑source R1 and V3 large language models, highlighting their technical breakthroughs, cost advantages, expert opinions, industry adoption across chips, cloud services, and applications, and discusses future directions for model scaling, distillation, and AI competition.

AI competitionAI industryDeepSeek
0 likes · 13 min read
DeepSeek R1 and V3: Model Innovations, Industry Impact, and Future Trends
Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
Feb 8, 2025 · Artificial Intelligence

Why DeepSeek V3 and R1 Are Redefining Low‑Cost AI: Architecture, Training Tricks, and Industry Impact

This article analyses DeepSeek's V3 and R1 models, explaining how their innovative MoE architecture, Multi‑Head Latent Attention, low‑cost training strategies, and distributed‑training optimizations deliver high‑performance large language models while reducing GPU/NPU demand and sparking industry excitement.

AI inferenceDeepSeekMixture of Experts
0 likes · 16 min read
Why DeepSeek V3 and R1 Are Redefining Low‑Cost AI: Architecture, Training Tricks, and Industry Impact
AIWalker
AIWalker
Jan 21, 2025 · Artificial Intelligence

PKU Introduces Next Patch Prediction for Image Generation, Cutting Training Cost to ~0.6×

The paper proposes a Next Patch Prediction (NPP) paradigm that groups image tokens into high‑density patches, enabling autoregressive models to predict patches instead of individual tokens, which reduces training cost to about 0.6× and improves ImageNet FID scores by up to 1.0 across models ranging from 100 M to 1.4 B parameters.

Autoregressive ModelsFID improvementLlamaGen
0 likes · 10 min read
PKU Introduces Next Patch Prediction for Image Generation, Cutting Training Cost to ~0.6×
Baobao Algorithm Notes
Baobao Algorithm Notes
Mar 28, 2024 · Artificial Intelligence

How Qwen1.5‑MoE‑A2.7B Matches 70B LLM Performance with Only 2.7B Activated Parameters

Qwen1.5‑MoE‑A2.7B is a 2.7 billion‑parameter Mixture‑of‑Experts model that delivers performance comparable to leading 7 billion‑parameter LLMs while cutting training cost by 75% and boosting inference speed by 1.74×, and the article details its architecture, benchmarks, efficiency analysis, and deployment steps.

MoEModel BenchmarkQwen
0 likes · 13 min read
How Qwen1.5‑MoE‑A2.7B Matches 70B LLM Performance with Only 2.7B Activated Parameters
DataFunTalk
DataFunTalk
Dec 27, 2022 · Artificial Intelligence

Efficient Training for Very Large‑Scale Face Recognition and the FFC Framework

This article reviews the challenges of ultra‑large‑scale face recognition, presents existing solutions such as metric learning, PFC and VFC, and details the proposed FFC framework with dual loaders, ID groups, probe and gallery networks, plus experimental results showing its cost‑effective performance.

AIComputer VisionDeep Learning
0 likes · 7 min read
Efficient Training for Very Large‑Scale Face Recognition and the FFC Framework