Old Zhang's AI Learning
Old Zhang's AI Learning
Apr 28, 2026 · Artificial Intelligence

vLLM 0.20 Arrives with DeepSeek V4 Support – What’s New?

The vLLM 0.20.0 release dramatically upgrades the inference engine with DeepSeek V4 support, default CUDA 13, PyTorch 2.11, Transformers v5 compatibility, FlashAttention 4 MLA prefill, TurboQuant 2‑bit KV cache, an online quantization front‑end, IR enhancements, Model Runner V2 features, and a slew of new models, while providing detailed installation and upgrade guidance.

CUDA 13DeepSeek V4FlashAttention
0 likes · 10 min read
vLLM 0.20 Arrives with DeepSeek V4 Support – What’s New?
Old Meng AI Explorer
Old Meng AI Explorer
Apr 27, 2026 · Artificial Intelligence

DeepSeek V4 Unveiled: 1M‑Token Context for All Models – A Complete Developer Guide

DeepSeek V4, released on April 24, offers 1 million‑token context as a standard feature across both Pro and Flash variants, delivers top‑tier agent and reasoning performance, provides dramatic cost reductions compared to GPT‑5.5, and includes step‑by‑step integration instructions and broad hardware support.

1M token contextAI hardware supportAPI integration
0 likes · 12 min read
DeepSeek V4 Unveiled: 1M‑Token Context for All Models – A Complete Developer Guide
Java Web Project
Java Web Project
Apr 27, 2026 · Artificial Intelligence

DeepSeek V4 Meets Claude Code: A Cost‑Effective Leap in Open‑Source LLM Performance

DeepSeek V4 preview, released quietly on April 24, offers two models with 1 M token context and pricing 1/16 of Claude Opus, achieving near‑par performance on SWE‑bench and LiveCodeBench, while integration with Claude Code enables rapid project understanding, bug detection, refactoring, testing and documentation, saving days of work for under ¥6.

Agentic CodingClaude CodeDeepSeek V4
0 likes · 15 min read
DeepSeek V4 Meets Claude Code: A Cost‑Effective Leap in Open‑Source LLM Performance
CodeTrend
CodeTrend
Apr 26, 2026 · Artificial Intelligence

Why DeepSeek V4 Can Run on Huawei Ascend: A Deep Technical Breakdown

The article analyzes why most open‑source large models cannot run on Huawei Ascend NPU, detailing the CUDA‑centric ecosystem, Ascend's CANN stack, three core technical hurdles, and the deep collaboration and tooling that enabled DeepSeek V4’s successful adaptation.

AI model portingCANNDeepSeek V4
0 likes · 10 min read
Why DeepSeek V4 Can Run on Huawei Ascend: A Deep Technical Breakdown
Old Zhang's AI Learning
Old Zhang's AI Learning
Apr 26, 2026 · Artificial Intelligence

Why Deploying DeepSeek‑V4 Locally with vLLM Is So Challenging

The article dissects DeepSeek‑V4’s local deployment using vLLM, explaining the steep hardware requirements, the complex heterogeneous KV‑cache architecture, and the aggressive kernel‑fusion and multi‑stream optimizations that together make high‑context inference both memory‑intensive and engineering‑heavy.

DeepSeek V4GPU memoryKV cache
0 likes · 15 min read
Why Deploying DeepSeek‑V4 Locally with vLLM Is So Challenging
Architecture & Thinking
Architecture & Thinking
Apr 26, 2026 · Artificial Intelligence

DeepSeek V4: How Million‑Token Context and Open‑Source Design Redefine AI Ecosystems

DeepSeek V4, released on April 24, 2026, introduces a 1‑million‑token context via DSA sparse attention, offers Pro and Flash variants, adapts to domestic AI chips, cuts compute costs dramatically, and leverages open‑source weights to challenge the dominance of closed‑source LLMs, reshaping the global AI landscape.

AI hardware adaptationAgentic AIDeepSeek V4
0 likes · 9 min read
DeepSeek V4: How Million‑Token Context and Open‑Source Design Redefine AI Ecosystems
Old Zhang's AI Learning
Old Zhang's AI Learning
Apr 25, 2026 · Artificial Intelligence

Deploying DeepSeek‑V4‑Flash Locally on 2 × NVIDIA H20 (96 GB) – Quick Performance Test

This article walks through deploying DeepSeek‑V4‑Flash on a server with two NVIDIA H20 GPUs (96 GB each), detailing model download, Docker image preparation, launch script tweaks, memory compression via FP8 and expert parallelism, and reports observed concurrency limits and token‑per‑second speeds, including a test that disables the model's thinking mode.

DeepSeek V4DockerFP8 quantization
0 likes · 6 min read
Deploying DeepSeek‑V4‑Flash Locally on 2 × NVIDIA H20 (96 GB) – Quick Performance Test
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Apr 25, 2026 · Artificial Intelligence

Why DeepSeek‑V4 Took Twice as Long: Inside the Training‑Stability Challenges and Engineering Hacks

The DeepSeek‑V4 technical report reveals that the model’s doubled training time stems from massive token and parameter scaling, severe training‑stability issues in MoE layers, and a suite of engineering solutions—including Anticipatory Routing, SwiGLU Clamping, specialist expert training, and a custom sandbox cluster—while also exposing high hallucination rates despite impressive benchmark performance.

BenchmarkDeepSeek V4Generative Reward Model
0 likes · 12 min read
Why DeepSeek‑V4 Took Twice as Long: Inside the Training‑Stability Challenges and Engineering Hacks
ArcThink
ArcThink
Apr 25, 2026 · Artificial Intelligence

DeepSeek V4’s Silent Launch: 1.6 T Parameters, Triple Innovation, and Redefined Accessibility

DeepSeek V4 quietly debuted with a 1.6‑trillion‑parameter MoE model, introducing CSA+HCA compressed attention, mHC manifold‑constrained hyperconnections, and the Muon optimizer, achieving 1M‑token context at a quarter of V3’s cost, top Codeforces and LiveCodeBench scores, a 1/7 Opus price, MIT open‑source licensing, and dual‑stack Ascend NPU/NVIDIA GPU support.

BenchmarkDeepSeek V4Large Language Model
0 likes · 17 min read
DeepSeek V4’s Silent Launch: 1.6 T Parameters, Triple Innovation, and Redefined Accessibility
DataFunTalk
DataFunTalk
Apr 25, 2026 · Artificial Intelligence

DeepSeek‑V4 vs GPT‑5.5: First Real‑World Tests Reveal Surprising Results

On the day GPT‑5.5 launched, DeepSeek‑V4 followed, and a series of head‑to‑head tests—including a logic puzzle, an IMO math problem, HTML generation, game‑engine coding, token‑efficiency measurement, and a network‑security challenge—showed GPT‑5.5 generally leading while DeepSeek demonstrated notable strengths and cost advantages.

AI model benchmarkAI securityDeepSeek V4
0 likes · 14 min read
DeepSeek‑V4 vs GPT‑5.5: First Real‑World Tests Reveal Surprising Results
Su San Talks Tech
Su San Talks Tech
Apr 25, 2026 · Artificial Intelligence

GPT-5.5 vs DeepSeek V4: Which Model Wins the AI Race?

The article compares OpenAI's GPT‑5.5 and DeepSeek V4 on architecture, inference efficiency, benchmark performance, pricing, and ecosystem openness, offering scenario‑based recommendations to help developers choose the model that best fits their cost, performance, and deployment needs.

AI model comparisonDeepSeek V4GPT-5.5
0 likes · 9 min read
GPT-5.5 vs DeepSeek V4: Which Model Wins the AI Race?
PaperAgent
PaperAgent
Apr 24, 2026 · Artificial Intelligence

DeepSeek‑V4 Open‑Sources Its Million‑Token Architecture and Calls Out Claude Opus 4.6

DeepSeek‑V4’s open‑source report reveals a hybrid CSA/HCA attention design, manifold‑constrained residuals and the Muon optimizer that cut per‑token FLOPs to 27 % and KV‑Cache to 10 % at 1 M tokens, while benchmark results show it outperforms Claude Opus 4.6 on most tasks yet still lags on complex instruction following and multi‑turn dialogue.

AI ArchitectureBenchmarkClaude Opus
0 likes · 11 min read
DeepSeek‑V4 Open‑Sources Its Million‑Token Architecture and Calls Out Claude Opus 4.6
SuanNi
SuanNi
Apr 24, 2026 · Artificial Intelligence

DeepSeek-V4 Launches: Million-Token Context Becomes Affordable for All

DeepSeek-V4 introduces a hybrid attention architecture, manifold‑constrained hyper‑connections, and the Muon optimizer to cut inference FLOPs and KV cache dramatically, enabling open‑source models to handle million‑token contexts at a fraction of the cost of leading closed‑source services while matching their performance.

BenchmarkDeepSeek V4Hybrid attention
0 likes · 7 min read
DeepSeek-V4 Launches: Million-Token Context Becomes Affordable for All
IT Services Circle
IT Services Circle
Apr 24, 2026 · Artificial Intelligence

DeepSeek V4 Released: Open-Source LLM Challenges Closed-Source Leaders and Partners with Huawei Chips

DeepSeek V4 launches in two versions—Pro and Flash—offering 1 M token context, enhanced agent capabilities, world‑knowledge and reasoning performance, a new token‑compression attention mechanism with DSA sparse attention, Huawei compute support, updated APIs, and a migration plan for legacy models.

1M contextAPI integrationDSA sparse attention
0 likes · 8 min read
DeepSeek V4 Released: Open-Source LLM Challenges Closed-Source Leaders and Partners with Huawei Chips
AI Explorer
AI Explorer
Apr 24, 2026 · Artificial Intelligence

DeepSeek-V4 Raises the Bar: 1.6T‑Parameter Open‑Source Model Challenges Closed‑Source Giants

DeepSeek-V4 introduces two open‑source LLMs—V4‑Pro with 1.6 trillion total parameters and V4‑Flash with 284 billion—offering a 1 million‑token context window, hybrid attention, multi‑head compression, and a new Muon optimizer, all under an MIT license that rivals top closed‑source models.

DeepSeek V4Hybrid attentionLarge Language Model
0 likes · 6 min read
DeepSeek-V4 Raises the Bar: 1.6T‑Parameter Open‑Source Model Challenges Closed‑Source Giants
Tech Musings
Tech Musings
Apr 24, 2026 · Artificial Intelligence

DeepSeek-V4 Unveiled: 1M Context Length and Ascend Compute Power

DeepSeek has launched the open‑source DeepSeek‑V4 series, offering Pro and Flash models with a 1 million token context window, a novel sparse attention mechanism, performance that rivals Opus 4.6 on coding and knowledge benchmarks, tiered pricing, and future cost reductions once Ascend 950 supernodes become widely available.

1M contextAI benchmarkingDeepSeek V4
0 likes · 5 min read
DeepSeek-V4 Unveiled: 1M Context Length and Ascend Compute Power
Machine Heart
Machine Heart
Apr 24, 2026 · Artificial Intelligence

DeepSeek V4 Unveiled: Dual Versions with 1M Token Context and New Mixed‑Attention Architecture

DeepSeek V4 launches two models—Flash and Pro—both supporting up to 1 million token context and 384 K output tokens, offering non‑thinking and thinking modes with a reasoning_effort parameter, and featuring mixed attention, manifold‑constrained hyperconnections, a Muon optimizer, massive training data, and up to 73% FLOPs reduction versus V3.

AI modelCambriconDeepSeek V4
0 likes · 5 min read
DeepSeek V4 Unveiled: Dual Versions with 1M Token Context and New Mixed‑Attention Architecture