training infrastructure — 3 Technical Articles

Mar 6, 2026 · Artificial Intelligence

How Step 3.5 Flash Bridges the Gap to Top LLMs with Sparse Expert Architecture

Step 3.5 Flash, a 196‑billion‑parameter sparse‑mixture‑of‑experts LLM, combines sliding‑window and full attention, multi‑token prediction, and a custom Steptron training framework to achieve performance on par with leading models while optimizing long‑context efficiency and training stability.

benchmarksparse experttraining infrastructure

0 likes · 11 min read

How Step 3.5 Flash Bridges the Gap to Top LLMs with Sparse Expert Architecture

Baobao Algorithm Notes

Sep 2, 2025 · Artificial Intelligence

How LongCat‑Flash Achieves Record Speed and Efficiency for a 560B MoE Model

LongCat‑Flash is a 560‑billion‑parameter Mixture‑of‑Experts LLM that combines a dynamic zero‑computation expert design, shortcut‑connected MoE communication, variance‑aligned scaling, and a three‑stage agent‑centric pre‑training pipeline, delivering over 100 TPS on H800 GPUs at a cost of $0.70 per million tokens.

Artificial IntelligenceLarge Language ModelLongCat-Flash

0 likes · 23 min read

How LongCat‑Flash Achieves Record Speed and Efficiency for a 560B MoE Model

Baobao Algorithm Notes

Jul 8, 2024 · Industry Insights

Why Large‑Model Deployment Stalls: Robots, Scaling Laws, and Multimodal Frontiers

The article analyzes current challenges in deploying large AI models, covering robot automation, scaling‑law limits, vertical‑domain use cases, multimodal breakthroughs, algorithmic evolution, and the hardware‑software trade‑offs of training and inference infrastructures, while questioning ROI and practical feasibility.

Roboticsalgorithm evolutioninference infrastructure

0 likes · 21 min read

Why Large‑Model Deployment Stalls: Robots, Scaling Laws, and Multimodal Frontiers

How Step 3.5 Flash Bridges the Gap to Top LLMs with Sparse Expert Architecture

How LongCat‑Flash Achieves Record Speed and Efficiency for a 560B MoE Model

Why Large‑Model Deployment Stalls: Robots, Scaling Laws, and Multimodal Frontiers

How Step 3.5 Flash Bridges the Gap to Top LLMs with Sparse Expert Architecture