SuanNi
SuanNi
Mar 6, 2026 · Artificial Intelligence

How Step 3.5 Flash Bridges the Gap to Top LLMs with Sparse Expert Architecture

Step 3.5 Flash, a 196‑billion‑parameter sparse‑mixture‑of‑experts LLM, combines sliding‑window and full attention, multi‑token prediction, and a custom Steptron training framework to achieve performance on par with leading models while optimizing long‑context efficiency and training stability.

benchmarksparse experttraining infrastructure
0 likes · 11 min read
How Step 3.5 Flash Bridges the Gap to Top LLMs with Sparse Expert Architecture
Baobao Algorithm Notes
Baobao Algorithm Notes
Sep 2, 2025 · Artificial Intelligence

How LongCat‑Flash Achieves Record Speed and Efficiency for a 560B MoE Model

LongCat‑Flash is a 560‑billion‑parameter Mixture‑of‑Experts LLM that combines a dynamic zero‑computation expert design, shortcut‑connected MoE communication, variance‑aligned scaling, and a three‑stage agent‑centric pre‑training pipeline, delivering over 100 TPS on H800 GPUs at a cost of $0.70 per million tokens.

Artificial IntelligenceLarge Language ModelLongCat-Flash
0 likes · 23 min read
How LongCat‑Flash Achieves Record Speed and Efficiency for a 560B MoE Model
Baobao Algorithm Notes
Baobao Algorithm Notes
Jul 8, 2024 · Industry Insights

Why Large‑Model Deployment Stalls: Robots, Scaling Laws, and Multimodal Frontiers

The article analyzes current challenges in deploying large AI models, covering robot automation, scaling‑law limits, vertical‑domain use cases, multimodal breakthroughs, algorithmic evolution, and the hardware‑software trade‑offs of training and inference infrastructures, while questioning ROI and practical feasibility.

Roboticsalgorithm evolutioninference infrastructure
0 likes · 21 min read
Why Large‑Model Deployment Stalls: Robots, Scaling Laws, and Multimodal Frontiers