Tagged articles
4 articles
Page 1 of 1
Wu Shixiong's Large Model Academy
Wu Shixiong's Large Model Academy
Oct 22, 2025 · Artificial Intelligence

Mastering LLM Training: A Step‑by‑Step Blueprint from Data to Alignment

This guide walks through the complete end‑to‑end process of training a large language model from scratch, covering data collection, cleaning, tokenization, pre‑training objectives and engineering, post‑training alignment methods, scaling laws, over‑fitting mitigation, and gradient‑stability techniques.

AlignmentLLMgradient stability
0 likes · 9 min read
Mastering LLM Training: A Step‑by‑Step Blueprint from Data to Alignment
DevOps
DevOps
Apr 7, 2025 · Artificial Intelligence

Meta Llama 4 Scout, Maverick, and Behemoth: Architecture, NoPE Innovation, and Training Advances

The article introduces Meta's newly open‑sourced Llama 4 series—including Scout with a 1 billion‑token context window, Maverick with 400 billion parameters, and the upcoming Behemoth teacher model—detailing their expert‑mix architecture, the NoPE positional‑encoding removal, training pipelines, performance benchmarks, and infrastructure improvements for large‑scale AI research.

AI researchContext WindowLlama 4
0 likes · 8 min read
Meta Llama 4 Scout, Maverick, and Behemoth: Architecture, NoPE Innovation, and Training Advances
NewBeeNLP
NewBeeNLP
Mar 20, 2024 · Artificial Intelligence

How Open‑Sora 1.0 Replicates Sora: Architecture, Training Pipeline & Performance Insights

This article provides a comprehensive technical walkthrough of Open‑Sora 1.0, covering its Diffusion‑Transformer architecture, three‑stage training strategy, data‑preprocessing scripts, generation quality, and the Colossal‑AI acceleration that together make Sora‑level video synthesis openly reproducible.

AI videoDiffusion TransformerOpen-Sora
0 likes · 12 min read
How Open‑Sora 1.0 Replicates Sora: Architecture, Training Pipeline & Performance Insights
Tencent Architect
Tencent Architect
Jul 29, 2021 · Artificial Intelligence

Performance Optimization of Advertising Coarse‑Ranking Training on the Light Framework

This article analyzes the bottlenecks of advertising coarse‑ranking training on the Light framework and presents a series of optimizations—including parallel data download, thread‑queue buffering, integer‑to‑string conversion with fmt, and zlib replacement with czlib—that together achieve up to 58% QPS improvement and notable CPU efficiency gains.

AdvertisingCPU/GPU efficiencyData Parallelism
0 likes · 11 min read
Performance Optimization of Advertising Coarse‑Ranking Training on the Light Framework