Tagged articles

DeepSeek-V3

9 articles · Page 1 of 1

Jun 22, 2026 · Artificial Intelligence

How ORGEval Revealed DeepSeek‑V3’s Surprising Modeling Strength

The paper introduces ORGEval, a graph‑theoretic evaluation framework that replaces costly solvers with bipartite‑graph isomorphism checks, proves a sufficient condition for WL‑test correctness, and shows on the Bench4Opt benchmark that DeepSeek‑V3 outperforms leading inference models in speed, consistency, and overall modeling accuracy.

DeepSeek-V3LLM evaluationORGEval

0 likes · 12 min read

How ORGEval Revealed DeepSeek‑V3’s Surprising Modeling Strength

Python Programming Learning Circle

Sep 16, 2025 · Artificial Intelligence

Boost Your Python Coding with DeepSeek‑V3 in PyCharm: A Step‑by‑Step Guide

This tutorial walks you through integrating the 671‑billion‑parameter DeepSeek‑V3 model into PyCharm via the Continue plugin, covering API key creation, plugin installation, configuration of model parameters, and practical code‑explanation and modification demos to enhance your Python development workflow.

AI code assistanceContinue pluginDeepSeek-V3

0 likes · 5 min read

Boost Your Python Coding with DeepSeek‑V3 in PyCharm: A Step‑by‑Step Guide

IT Services Circle

Jul 22, 2025 · Artificial Intelligence

Why Kimi K2 Overtook DeepSeek to Become the Top Open‑Source AI Model

Kimi K2 has surged to the global open‑source #1 spot, ranking fifth overall and rivaling top closed‑source models, thanks to strong multi‑turn dialogue, programming, and complex‑prompt abilities, extensive community adoption, and a refined DeepSeek V3‑based architecture.

AI performanceDeepSeek-V3Kimi K2

0 likes · 8 min read

Why Kimi K2 Overtook DeepSeek to Become the Top Open‑Source AI Model

Software Engineering 3.0 Era

May 15, 2025 · Artificial Intelligence

DeepSeek‑V3 Paper Reveals Breakthrough Hardware‑Software Co‑Design for AI Efficiency

DeepSeek‑V3 demonstrates that a tightly coupled hardware‑software design—featuring a memory‑saving MLA cache, a compute‑efficient DeepSeekMoE, a multi‑token prediction module, FP8 training, LogFMT compression, and an optimized eight‑plane fat‑tree network—can train a competitive LLM with only 2,048 H800 GPUs, cutting compute by up to 80% and boosting generation speed by 1.8×.

DeepSeek-V3FP8 trainingLLM

0 likes · 12 min read

DeepSeek‑V3 Paper Reveals Breakthrough Hardware‑Software Co‑Design for AI Efficiency

Architect

Feb 16, 2025 · Artificial Intelligence

DeepSeek-V3, DeepSeek-R1, and Janus‑Pro: Architecture, Training Techniques, and Performance Insights

This article provides an in‑depth technical overview of DeepSeek‑V3, DeepSeek‑R1 and Janus‑Pro models, covering their Mixture‑of‑Experts architecture, novel MLA attention, auxiliary‑loss‑free load balancing, multi‑token prediction, FP8 mixed‑precision training, efficient cross‑node communication, reinforcement‑learning pipelines, multimodal modeling strategies, performance comparisons, cost statistics, and current limitations.

AI ArchitectureDeepSeek-V3FP8 training

0 likes · 18 min read

DeepSeek-V3, DeepSeek-R1, and Janus‑Pro: Architecture, Training Techniques, and Performance Insights

Architects' Tech Alliance

Feb 12, 2025 · Artificial Intelligence

DeepSeek‑V3 Training Efficiency, Knowledge Distillation, and the Risks of Synthetic Data

The article examines DeepSeek‑V3’s low‑cost training using 2048 H800 GPUs, explains how knowledge distillation and high‑quality data improve efficiency, discusses expert concerns about training on AI‑generated content, and outlines the limitations and ceiling effect of distillation techniques.

AI Training EfficiencyAI safetyDeepSeek-V3

0 likes · 7 min read

DeepSeek‑V3 Training Efficiency, Knowledge Distillation, and the Risks of Synthetic Data

Alibaba Cloud Big Data AI Platform

Jan 10, 2025 · Artificial Intelligence

Deploy DeepSeek‑V3 LLM on Alibaba Cloud with One‑Click Model Gallery

This article introduces the 671‑billion‑parameter DeepSeek‑V3 Mixture‑of‑Experts LLM, explains the PAI‑Model Gallery platform that aggregates top AI models, and provides a step‑by‑step guide to deploy DeepSeek‑V3 on Alibaba Cloud’s PAI‑EAS service with zero‑code configuration.

AI DeploymentAlibaba CloudDeepSeek-V3

0 likes · 5 min read

Deploy DeepSeek‑V3 LLM on Alibaba Cloud with One‑Click Model Gallery

Baobao Algorithm Notes

Jan 8, 2025 · Artificial Intelligence

Inside Llama 3.1, DeepSeek‑V3, TÜLU 3 & Qwen 2.5: A Deep Dive into Post‑Training Techniques

This article compiles and analyzes the post‑training pipelines of Llama 3.1, DeepSeek‑V3, TÜLU 3 and Qwen 2.5, detailing their data compositions, SFT, reward modeling, DPO, GRPO, RLVR methods, hyper‑parameters, and practical tricks for large‑language‑model alignment.

DPODeepSeek-V3Llama3.1

0 likes · 22 min read

Inside Llama 3.1, DeepSeek‑V3, TÜLU 3 & Qwen 2.5: A Deep Dive into Post‑Training Techniques

Baobao Algorithm Notes

Jan 3, 2025 · Artificial Intelligence

How DeepSeek-V3 Achieves Massive Scale with FP8, MoE, and System Optimizations

The article examines DeepSeek‑V3’s architecture and training pipeline, highlighting its use of MLA and a highly granular MoE design, pioneering FP8 mixed‑precision training, fine‑grained per‑tile quantization, advanced parallelism strategies, and inference optimizations such as PD separation and NanoFlow to achieve unprecedented efficiency on limited GPU resources.

DeepSeek-V3FP8Inference Optimization

0 likes · 10 min read

How DeepSeek-V3 Achieves Massive Scale with FP8, MoE, and System Optimizations