Tagged articles

12 articles

Page 1 of 1

Machine Learning Algorithms & Natural Language Processing

May 14, 2026 · Artificial Intelligence

Boosting LLM Pre‑training 2.5× Without Architecture Changes or Extra Compute

Nous Research introduces Token Superposition Training, which groups tokens into bags, averages their embeddings, and predicts token groups without altering model architecture or adding compute, achieving up to 2.5× faster pre‑training while maintaining standard inference.

LLM PretrainingMCE LossMoE

0 likes · 10 min read

Boosting LLM Pre‑training 2.5× Without Architecture Changes or Extra Compute

Machine Heart

Apr 15, 2026 · Artificial Intelligence

DataFlex: An Industrial‑Grade Dynamic Data Training System for Large Models

DataFlex, built on LLaMA‑Factory, offers a unified, reproducible infrastructure that dynamically selects, mixes, and re‑weights training data, turning data into a controllable optimization object and delivering measurable gains in training efficiency and model performance for large‑scale AI models.

DataFlexData‑Centric AIDynamic Data Training

0 likes · 14 min read

DataFlex: An Industrial‑Grade Dynamic Data Training System for Large Models

AIWalker

Mar 16, 2026 · Artificial Intelligence

DETR Drops Hungarian Matching: Double Training Speed, +4.2 AP on Large Objects

Beyond-Hungarian replaces the costly Hungarian assignment in DETR with a differentiable, query‑free matching scheme that halves training latency, boosts large‑object AP by 4.2 points, and introduces a GT‑Probe module and dual‑loss framework, while detailing trade‑offs, ablations, and future challenges.

DETRGT-ProbeHungarian matching

0 likes · 14 min read

DETR Drops Hungarian Matching: Double Training Speed, +4.2 AP on Large Objects

AI Explorer

Mar 15, 2026 · Artificial Intelligence

Large Models May Break Language Training Dependence, Redefining Intelligence

A new study suggests that large AI models could reduce their reliance on massive text corpora by early‑fusing multimodal data such as video and sensor streams, potentially slashing training costs, improving generalization, and prompting a shift toward more embodied notions of intelligence.

AI researchEmbodied IntelligenceMultimodal Learning

0 likes · 6 min read

Large Models May Break Language Training Dependence, Redefining Intelligence

Baobao Algorithm Notes

Dec 22, 2025 · Artificial Intelligence

Which Agentic RL Framework Wins? A Deep Dive into AReal, Seer, Slime & verl

This article analyzes the training‑efficiency challenges of multi‑turn agentic reinforcement learning and compares four recent open‑source frameworks—AReal (Ant), Seer (Moonshot), Slime (Zhipu) and verl (Bytedance)—examining their asynchronous inference designs, rollout‑train separation, long‑context handling, off‑policy mitigation, and system‑level optimizations to guide framework selection.

Asynchronous InferenceRL SystemsTraining efficiency

0 likes · 18 min read

Which Agentic RL Framework Wins? A Deep Dive into AReal, Seer, Slime & verl

Xiaohongshu Tech REDtech

Jun 6, 2025 · Artificial Intelligence

How dots.llm1 Sets New Benchmarks for Open‑Source MoE Language Models

dots.llm1, an open‑source 142‑billion‑parameter Mixture‑of‑Experts language model from hi lab, achieves Qwen2.5‑72B‑level performance after training on 11.2 T high‑quality tokens, and the release includes full models, intermediate checkpoints, and detailed training pipelines for the research community.

AI researchMixture of ExpertsTraining efficiency

0 likes · 10 min read

How dots.llm1 Sets New Benchmarks for Open‑Source MoE Language Models

DataFunTalk

Mar 9, 2025 · Artificial Intelligence

Critique Fine-Tuning (CFT): Boosting Large Language Model Reasoning with Minimal Data

The paper introduces Critique Fine-Tuning (CFT), a method that replaces simple imitation in supervised fine‑tuning with critique‑based learning, achieving superior reasoning performance on mathematical benchmarks using only 50 K samples, outperforming traditional reinforcement‑learning approaches that require millions of examples.

AI reasoningCritique Fine-TuningMathematical Benchmarks

0 likes · 7 min read

Critique Fine-Tuning (CFT): Boosting Large Language Model Reasoning with Minimal Data

ZhongAn Tech Team

Feb 16, 2025 · Artificial Intelligence

DeepSeek R1 and V3: Model Innovations, Industry Impact, and Future Trends

The article reviews DeepSeek's open‑source R1 and V3 large language models, highlighting their technical breakthroughs, cost advantages, expert opinions, industry adoption across chips, cloud services, and applications, and discusses future directions for model scaling, distillation, and AI competition.

AI competitionAI industryDeepSeek

0 likes · 13 min read

DeepSeek R1 and V3: Model Innovations, Industry Impact, and Future Trends

Huawei Cloud Developer Alliance

Feb 8, 2025 · Artificial Intelligence

Why DeepSeek V3 and R1 Are Redefining Low‑Cost AI: Architecture, Training Tricks, and Industry Impact

This article analyses DeepSeek's V3 and R1 models, explaining how their innovative MoE architecture, Multi‑Head Latent Attention, low‑cost training strategies, and distributed‑training optimizations deliver high‑performance large language models while reducing GPU/NPU demand and sparking industry excitement.

AI inferenceDeepSeekMixture of Experts

0 likes · 16 min read

Why DeepSeek V3 and R1 Are Redefining Low‑Cost AI: Architecture, Training Tricks, and Industry Impact

AIWalker

Jan 21, 2025 · Artificial Intelligence

PKU Introduces Next Patch Prediction for Image Generation, Cutting Training Cost to ~0.6×

The paper proposes a Next Patch Prediction (NPP) paradigm that groups image tokens into high‑density patches, enabling autoregressive models to predict patches instead of individual tokens, which reduces training cost to about 0.6× and improves ImageNet FID scores by up to 1.0 across models ranging from 100 M to 1.4 B parameters.

Autoregressive ModelsFID improvementLlamaGen

0 likes · 10 min read

PKU Introduces Next Patch Prediction for Image Generation, Cutting Training Cost to ~0.6×

Baobao Algorithm Notes

Mar 28, 2024 · Artificial Intelligence

How Qwen1.5‑MoE‑A2.7B Matches 70B LLM Performance with Only 2.7B Activated Parameters

Qwen1.5‑MoE‑A2.7B is a 2.7 billion‑parameter Mixture‑of‑Experts model that delivers performance comparable to leading 7 billion‑parameter LLMs while cutting training cost by 75% and boosting inference speed by 1.74×, and the article details its architecture, benchmarks, efficiency analysis, and deployment steps.

MoEModel BenchmarkQwen

0 likes · 13 min read

How Qwen1.5‑MoE‑A2.7B Matches 70B LLM Performance with Only 2.7B Activated Parameters

DataFunTalk

Dec 27, 2022 · Artificial Intelligence

Efficient Training for Very Large‑Scale Face Recognition and the FFC Framework

This article reviews the challenges of ultra‑large‑scale face recognition, presents existing solutions such as metric learning, PFC and VFC, and details the proposed FFC framework with dual loaders, ID groups, probe and gallery networks, plus experimental results showing its cost‑effective performance.

AIComputer VisionDeep Learning

0 likes · 7 min read

Efficient Training for Very Large‑Scale Face Recognition and the FFC Framework