Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Apr 25, 2026 · Artificial Intelligence

DeepSeek V4 Unveiled: 1M‑Token Context and New Architecture Challenge Closed‑Source LLMs

DeepSeek V4 introduces two flagship models—V4‑Pro with 1.6 T parameters and V4‑Flash with 284 B parameters—offering million‑token context, mixed attention (CSA + HCA), manifold‑constrained residuals, and the Muon optimizer, delivering open‑source performance that rivals top closed‑source LLMs while cutting inference cost dramatically.

1M contextDeepSeekLarge Language Model
0 likes · 10 min read
DeepSeek V4 Unveiled: 1M‑Token Context and New Architecture Challenge Closed‑Source LLMs
PaperAgent
PaperAgent
Apr 24, 2026 · Artificial Intelligence

DeepSeek‑V4 Open‑Sources Its Million‑Token Architecture and Calls Out Claude Opus 4.6

DeepSeek‑V4’s open‑source report reveals a hybrid CSA/HCA attention design, manifold‑constrained residuals and the Muon optimizer that cut per‑token FLOPs to 27 % and KV‑Cache to 10 % at 1 M tokens, while benchmark results show it outperforms Claude Opus 4.6 on most tasks yet still lags on complex instruction following and multi‑turn dialogue.

AI ArchitectureClaude OpusDeepSeek V4
0 likes · 11 min read
DeepSeek‑V4 Open‑Sources Its Million‑Token Architecture and Calls Out Claude Opus 4.6
Architects' Tech Alliance
Architects' Tech Alliance
Apr 24, 2026 · Artificial Intelligence

DeepSeek V4 Launches with 1M‑Token Context, Dual Versions and Native Chinese Chip Support

On April 24, 2026 DeepSeek released the V4 preview featuring two models—V4‑Pro with a 1.6 T‑parameter MoE architecture and V4‑Flash with 284 B parameters—both offering 1 million token context, up to 384 K output tokens, new step‑wise reasoning modes, and full native compatibility with Huawei Ascend and Cambricon chips, while delivering major efficiency gains and benchmark‑leading performance.

1M token contextCambriconDeepSeek
0 likes · 7 min read
DeepSeek V4 Launches with 1M‑Token Context, Dual Versions and Native Chinese Chip Support
Machine Heart
Machine Heart
Apr 24, 2026 · Artificial Intelligence

DeepSeek V4 Unveiled: Dual Versions with 1M Token Context and New Mixed‑Attention Architecture

DeepSeek V4 launches two models—Flash and Pro—both supporting up to 1 million token context and 384 K output tokens, offering non‑thinking and thinking modes with a reasoning_effort parameter, and featuring mixed attention, manifold‑constrained hyperconnections, a Muon optimizer, massive training data, and up to 73% FLOPs reduction versus V3.

AI modelCambriconDeepSeek V4
0 likes · 5 min read
DeepSeek V4 Unveiled: Dual Versions with 1M Token Context and New Mixed‑Attention Architecture
AI Algorithm Path
AI Algorithm Path
Sep 14, 2025 · Artificial Intelligence

Qwen3-Next: Achieving Unmatched Training and Inference Cost‑Effectiveness

Alibaba's Qwen team unveils Qwen3-Next, a hybrid expert LLM with 800 B parameters but only 30 B active, delivering training costs under one‑tenth of comparable dense models and more than ten‑fold inference throughput for long contexts, while matching or surpassing larger models on benchmark tasks.

AILLMMulti‑Token Prediction
0 likes · 9 min read
Qwen3-Next: Achieving Unmatched Training and Inference Cost‑Effectiveness
Meituan Technology Team
Meituan Technology Team
Jan 28, 2021 · Artificial Intelligence

Trajectory Prediction Algorithm for Autonomous Vehicles: Winning Solutions in NeurIPS 2020 INTERPRET Challenge

Meituan’s unmanned delivery team secured first place in the Generalizability track and second in the Regular track of the NeurIPS 2020 INTERPRET trajectory‑prediction challenge by employing a mixed‑attention graph‑transformer with dual‑channel GRU and adaptive map processing, achieving ADEs of 0.5339 m and 0.1912 m respectively.

NeurIPSautonomous vehiclesgraph neural network
0 likes · 15 min read
Trajectory Prediction Algorithm for Autonomous Vehicles: Winning Solutions in NeurIPS 2020 INTERPRET Challenge