Tagged articles
566 articles
Page 2 of 6
PaperAgent
PaperAgent
Jan 10, 2026 · Artificial Intelligence

DeepSeek V4 Unveiled: Why Its Coding Power Beats Claude and GPT

DeepSeek's newly announced V4 model, the successor to its December 2024 V3 release, demonstrates superior coding abilities over Claude and GPT series, details its data composition, infrastructure, training costs, failed experimental attempts, expanded benchmark comparisons, and includes a comprehensive safety report.

AI model analysisDeepSeekV4
0 likes · 4 min read
DeepSeek V4 Unveiled: Why Its Coding Power Beats Claude and GPT
AI Insight Log
AI Insight Log
Jan 9, 2026 · Industry Insights

Anthropic Blocks xAI from Claude; DeepSeek V4 Targets Code Supremacy at New Year

Anthropic abruptly cut off xAI employees’ access to its Claude model, labeling them a competitor, prompting xAI co‑founder Tony Wu to view the loss as both a short‑term productivity hit and a catalyst for accelerating its own coding AI, while Chinese startup DeepSeek is rumored to launch V4 during the upcoming Chinese New Year, claiming code‑generation capabilities that surpass current Anthropic and OpenAI models.

AI competitionAnthropicClaude
0 likes · 5 min read
Anthropic Blocks xAI from Claude; DeepSeek V4 Targets Code Supremacy at New Year
Mingyi World Elasticsearch
Mingyi World Elasticsearch
Jan 3, 2026 · Artificial Intelligence

Build Your Own AI Coding Assistant in 5 Minutes: A Hands‑On Guide

The article analyzes common pain points of traditional AI coding chats—repetitive context input, lengthy prompts, and generic answers—and demonstrates how to create a persistent, expert‑level AI coding assistant using Coco AI, with step‑by‑step configuration, example prompts, and future RAG enhancements.

AI AgentCoco AIDeepSeek
0 likes · 9 min read
Build Your Own AI Coding Assistant in 5 Minutes: A Hands‑On Guide
Design Hub
Design Hub
Jan 2, 2026 · Artificial Intelligence

DeepSeek’s “Mathematical Tight‑Fit” Tames AI: Constraints Drive Performance Gains

DeepSeek’s new mHC architecture replaces unconstrained hyper‑connections with manifold‑constrained doubly‑stochastic matrices, stabilizing large‑scale training, reducing signal explosion from 3000× to 1.6×, and delivering consistent accuracy improvements across BBH, DROP, GSM8K, and MMLU benchmarks while adding only 6.7% training overhead.

AI training stabilityDeepSeekhyper-connections
0 likes · 10 min read
DeepSeek’s “Mathematical Tight‑Fit” Tames AI: Constraints Drive Performance Gains
AI Insight Log
AI Insight Log
Jan 1, 2026 · Artificial Intelligence

Can DeepSeek’s mHC Architecture Break ResNet’s Decade-Long Dominance?

DeepSeek’s new paper “mHC: Manifold‑Constrained Hyper‑Connections” proposes a novel architecture that replaces traditional residual connections with mathematically constrained hyper‑connections, showing on a 27B model a modest 6.7 % training‑time increase but significant stability gains and superior performance on BBH, DROP and GSM8K benchmarks.

DeepSeekLLM trainingResNet
0 likes · 8 min read
Can DeepSeek’s mHC Architecture Break ResNet’s Decade-Long Dominance?
Baidu Geek Talk
Baidu Geek Talk
Dec 24, 2025 · Artificial Intelligence

Context Parallelism Slashes TTFT by 80% for 128K-Token LLMs

The article explains how Baidu’s Baige team integrated a Context Parallelism strategy into DeepSeek V3.2, detailing the DSA architecture, the limitations of traditional tensor and sequence parallelism, and how CP distributes computation and memory across GPUs to achieve up to an 80 % reduction in token‑to‑first‑token latency for ultra‑long 128K‑token contexts.

Context ParallelismDeepSeekLLM
0 likes · 9 min read
Context Parallelism Slashes TTFT by 80% for 128K-Token LLMs
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Dec 17, 2025 · Artificial Intelligence

How AFD Splits Attention and FFN to Boost DeepSeek‑V3 Inference by Up to 19%

The article details the Attention‑FFN Disaggregation (AFD) technique used by Baidu Baige to separate self‑attention and feed‑forward network stages in DeepSeek‑V3 models, describing multi‑stage scheduling, three‑batch overlap, communication optimizations, and performance results that achieve up to 19% throughput improvement under a 100 ms SLO.

3BOAFDAttention-FFN Disaggregation
0 likes · 17 min read
How AFD Splits Attention and FFN to Boost DeepSeek‑V3 Inference by Up to 19%
21CTO
21CTO
Dec 11, 2025 · Artificial Intelligence

Why DeepSeek’s Founder Made Nature’s 2025 Top‑10 Scientists List

Nature’s 2025 “Nature’s 10” list highlighted DeepSeek founder Liang Wenfeng for his breakthrough in AI transparency, noting his open‑weight model’s impact on researchers, while also detailing the model’s low‑cost performance and the other distinguished scientists honored that year.

DeepSeekLiang WenfengNature's 10
0 likes · 3 min read
Why DeepSeek’s Founder Made Nature’s 2025 Top‑10 Scientists List
Data Party THU
Data Party THU
Dec 10, 2025 · Artificial Intelligence

How DeepSeek‑V3.2 Cuts Inference Cost and Boosts Agent Skills with Sparse Attention

DeepSeek's V3.2 release introduces a dual‑model lineup, a Sparse Attention architecture that halves long‑context inference cost, a post‑training reinforcement‑learning pipeline that exceeds 10% of pre‑training compute, and a revamped agent framework that dramatically improves tool‑use and reasoning performance across benchmarks.

Agentic AIDeepSeekModel Optimization
0 likes · 11 min read
How DeepSeek‑V3.2 Cuts Inference Cost and Boosts Agent Skills with Sparse Attention
Old Meng AI Explorer
Old Meng AI Explorer
Dec 7, 2025 · Artificial Intelligence

Why DeepSeek-Math-V2 Is the New Benchmark for Rigorous AI Math Reasoning

DeepSeek-Math-V2, an open‑source math reasoning model from DeepSeek, introduces a self‑verification mechanism that ensures step‑by‑step logical correctness, achieving gold‑medal scores in IMO 2025, CMO 2024 and near‑perfect results in the Putnam 2024 competition, while offering free, extensible deployment for research, training, and scientific computation.

AI MathDeepSeekMathematical Reasoning
0 likes · 13 min read
Why DeepSeek-Math-V2 Is the New Benchmark for Rigorous AI Math Reasoning
Instant Consumer Technology Team
Instant Consumer Technology Team
Dec 5, 2025 · Artificial Intelligence

Transform Complex Prompts into Reusable AI Skills and Hook DeepSeek into Claude Code

This article explains how to replace cumbersome, city‑specific prompt strings with modular AI Skills, demonstrates the food‑diorama‑skill that generates 3D gourmet dioramas, and provides a step‑by‑step guide for connecting the DeepSeek V3.2 model to Claude Code using environment variables or the CC Switch GUI.

AIClaudeDeepSeek
0 likes · 8 min read
Transform Complex Prompts into Reusable AI Skills and Hook DeepSeek into Claude Code
Frontend AI Walk
Frontend AI Walk
Dec 5, 2025 · Artificial Intelligence

Master Prompt Engineering: From Random Chat to Precise Control with Zero-shot, Few-shot, and Chain‑of‑Thought

This article explains how to converse effectively with large language models by mastering three core prompting techniques—Zero‑shot, Few‑shot, and Chain‑of‑Thought—illustrated with front‑end analogies, code snippets, and a step‑by‑step DeepSeek JSON‑generation exercise that shows common pitfalls and best practices.

DeepSeekFew-ShotJSON generation
0 likes · 12 min read
Master Prompt Engineering: From Random Chat to Precise Control with Zero-shot, Few-shot, and Chain‑of‑Thought
Fun with Large Models
Fun with Large Models
Dec 5, 2025 · Artificial Intelligence

DeepSeek Math V2 & V3.2: A Plain‑Language Deep Dive into Core Innovations

This article provides a detailed, easy‑to‑understand analysis of DeepSeek‑Math‑V2’s self‑verification training method and DeepSeek‑V3.2’s GRPO framework, sparse‑attention DSA mechanism, massive agent data pipeline, and benchmark results that place both models among the world’s top open‑source large language models.

DeepSeekGRPOLLM
0 likes · 19 min read
DeepSeek Math V2 & V3.2: A Plain‑Language Deep Dive into Core Innovations
Aikesheng Open Source Community
Aikesheng Open Source Community
Dec 4, 2025 · Artificial Intelligence

Gemini 3 Pro vs DeepSeek‑V3.2‑Exp: Which LLM Dominates SQL Understanding, Optimization, and Dialect Conversion?

This report evaluates the professional‑grade LLMs Gemini 3 Pro and DeepSeek‑V3.2‑Exp on three SQL‑related dimensions—understanding, optimization, and dialect conversion—using the SCALE benchmark, presenting detailed scores, strengths, weaknesses, and practical recommendations for database engineers and decision makers.

DeepSeekGeminiLLM
0 likes · 16 min read
Gemini 3 Pro vs DeepSeek‑V3.2‑Exp: Which LLM Dominates SQL Understanding, Optimization, and Dialect Conversion?
PaperAgent
PaperAgent
Dec 2, 2025 · Artificial Intelligence

How DeepSeek‑V3.2’s New Agent Architecture Bridges the Gap to Closed‑Source LLMs

DeepSeek‑V3.2 introduces a reinforced‑agent framework that combines a synthetic task factory, scaling reinforcement learning, and advanced context management, achieving the highest open‑source agent scores and narrowing the performance gap with leading closed‑source models such as Claude‑4.5‑Sonnet, GPT‑5‑High, and Gemini‑3.0‑Pro.

AI agentsAgent ArchitectureDeepSeek
0 likes · 7 min read
How DeepSeek‑V3.2’s New Agent Architecture Bridges the Gap to Closed‑Source LLMs
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Nov 25, 2025 · Artificial Intelligence

Why DeepSeek‑V3.2‑Exp Lost Performance and How a Simple RoPE Fix Restored It

The Baidu Baige team discovered that DeepSeek‑V3.2‑Exp’s long‑context performance lagged behind the official report, traced the issue to a subtle RoPE layout mismatch in the open‑source inference demo, collaborated with DeepSeek to fix it, and verified that the model’s speed and accuracy fully recovered across multiple benchmarks.

AI InfrastructureDeepSeekLLM inference
0 likes · 9 min read
Why DeepSeek‑V3.2‑Exp Lost Performance and How a Simple RoPE Fix Restored It
ShiZhen AI
ShiZhen AI
Oct 24, 2025 · Artificial Intelligence

Why GPT‑5 Lost 72% While Chinese AI Models Gained 32% in the NOF1.AI Alpha Arena

The NOF1.AI Alpha Arena benchmark shows Chinese models like Qwen3 Max and DeepSeek out‑performing GPT‑5, delivering +32.42% and +22.46% returns respectively, while GPT‑5 suffers a -72.49% loss, highlighting the impact of trade frequency, risk control, and profit‑to‑loss ratios in AI‑driven crypto trading.

AI tradingAlpha ArenaDeepSeek
0 likes · 14 min read
Why GPT‑5 Lost 72% While Chinese AI Models Gained 32% in the NOF1.AI Alpha Arena
DataFunTalk
DataFunTalk
Oct 20, 2025 · Artificial Intelligence

How DeepSeek-OCR Achieves 10× Context Compression with Vision Tokens

DeepSeek-OCR, a newly open‑sourced 3B‑parameter OCR model, uses a novel DeepEncoder and a 3B MoE decoder to compress long‑text contexts into visual tokens, achieving up to 10× compression with 97% accuracy and demonstrating strong practical performance on benchmarks and multilingual documents.

DeepSeekMultimodal AIOCR
0 likes · 11 min read
How DeepSeek-OCR Achieves 10× Context Compression with Vision Tokens
Zhihu Tech Column
Zhihu Tech Column
Sep 23, 2025 · Backend Development

Build a High‑Performance AI Chatbot with FUST Microservices and DeepSeek

This tutorial walks through using Zhihu's open‑source FUST microservice framework together with DeepSeek's language model API to design, implement, and deploy a scalable, high‑performance intelligent Q&A system, covering architecture, data models, service layers, and deployment scripts.

AI chatbotDeepSeekFUST
0 likes · 16 min read
Build a High‑Performance AI Chatbot with FUST Microservices and DeepSeek
Code Wrench
Code Wrench
Sep 22, 2025 · Artificial Intelligence

Build a Private ChatGPT on Your Laptop with Ollama, DeepSeek‑R1 and Go MCP

This guide walks you through installing Ollama, pulling the open‑source DeepSeek‑R1:1.5B model, wrapping it with a Go‑based Model Context Protocol (MCP) server, creating a client example, and enhancing the experience with Open‑WebUI while offering performance‑tuning tips.

DeepSeekGoLocal AI
0 likes · 9 min read
Build a Private ChatGPT on Your Laptop with Ollama, DeepSeek‑R1 and Go MCP
Data Party THU
Data Party THU
Sep 21, 2025 · Artificial Intelligence

Building a Mini‑DeepSeek‑V3: Transformer Block and MTP Implementation on Limited Compute

This article walks through the design and implementation of a Mini‑DeepSeek‑V3 language model, detailing how to assemble the core Transformer block, integrate Multi‑Token Prediction (MTP) modules, construct the overall architecture, and compute the combined loss—all using modest GPU resources and a single‑card or DDP training setup.

AIDeepSeekMTP
0 likes · 12 min read
Building a Mini‑DeepSeek‑V3: Transformer Block and MTP Implementation on Limited Compute
Data Party THU
Data Party THU
Sep 20, 2025 · Artificial Intelligence

How DeepSeek Trained a $30M LLM for Just $29.4K – Inside the R1 Model

The article reports that DeepSeek’s R1 large language model, detailed in a peer‑reviewed Nature paper, was built with roughly $300 k in total cost—about $29.4 k for training—using Nvidia H800 chips and novel pure reinforcement‑learning techniques, achieving competitive performance while remaining open‑source.

DeepSeekNvidia H800Peer Review
0 likes · 9 min read
How DeepSeek Trained a $30M LLM for Just $29.4K – Inside the R1 Model
Data Party THU
Data Party THU
Sep 19, 2025 · Artificial Intelligence

How DeepSeek R1 Redefines AI Reasoning with Pure Reinforcement Learning

DeepSeek R1 replaces traditional supervised fine‑tuning with a pure reinforcement‑learning pipeline, introducing the GRPO algorithm and a four‑stage training regime that dramatically lowers cost, boosts reasoning and code‑generation performance, and raises important ethical, privacy, and societal considerations for large language models.

AI reasoningDeepSeekGRPO
0 likes · 14 min read
How DeepSeek R1 Redefines AI Reasoning with Pure Reinforcement Learning
DataFunTalk
DataFunTalk
Sep 18, 2025 · Artificial Intelligence

How DeepSeek‑R1’s Reinforcement Learning Earned a Nature Cover

DeepSeek‑R1, the first peer‑reviewed large language model, leveraged a pure reinforcement‑learning framework and the novel GRPO algorithm to achieve breakthrough reasoning performance, low training cost, and widespread acclaim, culminating in a Nature magazine cover story.

AI reasoningDeepSeekGRPO
0 likes · 14 min read
How DeepSeek‑R1’s Reinforcement Learning Earned a Nature Cover
Raymond Ops
Raymond Ops
Sep 14, 2025 · Artificial Intelligence

Create AI Videos with DeepSeek + Tongyi Wanxiang: Step-by-Step Guide

This article explains how to leverage the Chinese AI multimodal platform Tongyi Wanxiang together with DeepSeek to generate high-quality AI videos, covering AI video fundamentals, core features, application scenarios, detailed workflow, script creation, video synthesis, and Java API integration with code examples.

AI video generationDeepSeekJava SDK
0 likes · 25 min read
Create AI Videos with DeepSeek + Tongyi Wanxiang: Step-by-Step Guide
Aikesheng Open Source Community
Aikesheng Open Source Community
Sep 4, 2025 · Artificial Intelligence

How GPT‑5, DeepSeek‑V3.1 and SQLShift Stack Up in the August 2025 SQL LLM Benchmark

The August 2025 SCALE benchmark evaluates new AI models—including the GPT‑5 family, DeepSeek‑V3.1, and the SQLShift tool—across SQL understanding, optimization, and dialect conversion, revealing distinct strengths, weaknesses, and the growing advantage of specialized tools over generic large language models.

AIBenchmarkDeepSeek
0 likes · 15 min read
How GPT‑5, DeepSeek‑V3.1 and SQLShift Stack Up in the August 2025 SQL LLM Benchmark
Dunmao Tech Hub
Dunmao Tech Hub
Sep 1, 2025 · Artificial Intelligence

Deploy DeepSeek‑r1 Locally with a One‑Click Ollama Script

This guide walks you through a Bash script that automatically checks for Ollama, installs it if missing, lets you choose a DeepSeek‑r1 model size, starts the Ollama service, and runs the selected model locally, complete with usage examples and a token‑cost note.

AIDeepSeekModel Deployment
0 likes · 7 min read
Deploy DeepSeek‑r1 Locally with a One‑Click Ollama Script
IT Services Circle
IT Services Circle
Aug 28, 2025 · Artificial Intelligence

Why DeepSeek V3.1 Keeps Spitting the ‘Extreme’ Token and How to Fix It

Developers using DeepSeek V3.1's API have reported that the model intermittently inserts the Chinese character “极” (or its variants) into generated code, a bug that spreads across multiple platforms and threatens high‑precision code generation, prompting community workarounds and speculation about its root causes.

AI model bugDeepSeekLLM
0 likes · 6 min read
Why DeepSeek V3.1 Keeps Spitting the ‘Extreme’ Token and How to Fix It
Aikesheng Open Source Community
Aikesheng Open Source Community
Aug 28, 2025 · Artificial Intelligence

How Does DeepSeek‑V3.1 Perform on Professional SQL Tasks? A Detailed Benchmark

This report objectively evaluates DeepSeek‑V3.1 on professional‑grade SQL tasks, presenting its balanced strengths in understanding, optimization, and dialect conversion, highlighting its top scores in syntax error detection and Chinese database conversion while exposing weaknesses in execution‑plan analysis and large‑SQL transformations.

DeepSeekLLMartificial intelligence
0 likes · 8 min read
How Does DeepSeek‑V3.1 Perform on Professional SQL Tasks? A Detailed Benchmark
Efficient Ops
Efficient Ops
Aug 27, 2025 · Artificial Intelligence

Why DeepSeek V3.1 Randomly Inserts the Chinese Character “极” – Token Bug Explained

DeepSeek’s latest V3.1 model unexpectedly injects the Chinese character “极” into generated text, a token‑ID mix‑up that breaks code compilation, JSON parsing, and academic writing, with users tracing the issue to adjacent token IDs and two main hypotheses of dataset contamination or model shortcut.

AI SafetyDeepSeekLanguage Model
0 likes · 4 min read
Why DeepSeek V3.1 Randomly Inserts the Chinese Character “极” – Token Bug Explained
Architects' Tech Alliance
Architects' Tech Alliance
Aug 26, 2025 · Artificial Intelligence

How DeepSeek‑V3.1’s New FP8 Precision Supercharges Domestic Chip Performance

DeepSeek‑V3.1 introduces the UE8M0 FP8 Scale precision, cutting memory usage by up to 75% and enabling next‑generation Chinese chips such as Ascend 910B to run 128K context models efficiently, while the ecosystem rapidly adopts FP8, yet challenges in IP autonomy and software maturity remain before global competitiveness is achieved.

AI hardwareDeepSeekDomestic Chips
0 likes · 10 min read
How DeepSeek‑V3.1’s New FP8 Precision Supercharges Domestic Chip Performance
Raymond Ops
Raymond Ops
Aug 26, 2025 · Artificial Intelligence

How to Deploy DeepSeek R1 Locally: Versions, Hardware, and UI Tools

This guide explains DeepSeek R1’s model variants, hardware requirements, local installation steps using Ollama, LM Studio or Docker, and how to add visual interfaces like Open‑WebUI and Dify for a complete on‑premise AI solution.

DeepSeekDifyHardware Requirements
0 likes · 14 min read
How to Deploy DeepSeek R1 Locally: Versions, Hardware, and UI Tools
IT Services Circle
IT Services Circle
Aug 24, 2025 · Artificial Intelligence

What Is UE8M0 FP8 and Why It’s Boosting China’s Next‑Gen AI Chips

The article explains the UE8M0 FP8 precision format, its MXFP8 origins, how it reduces bandwidth and power consumption, and why Chinese AI chip makers like Cambricon, HaiGuang and Moore Threads are rapidly adopting it, signaling a shift toward domestic AI hardware independence.

AI hardwareChinese chipsDeepSeek
0 likes · 10 min read
What Is UE8M0 FP8 and Why It’s Boosting China’s Next‑Gen AI Chips
Fun with Large Models
Fun with Large Models
Aug 22, 2025 · Artificial Intelligence

Step‑by‑Step Guide: Building a PDF‑Based RAG Knowledge Base with LangChain, Streamlit, DashScope & DeepSeek

This tutorial shows how to create a lightweight Retrieval‑Augmented Generation (RAG) system that indexes multiple PDF files, stores their embeddings in a FAISS vector database, and answers user queries through a LangChain agent powered by DashScope embeddings and the DeepSeek‑Chat model, all wrapped in a Streamlit UI.

DashscopeDeepSeekFAISS
0 likes · 13 min read
Step‑by‑Step Guide: Building a PDF‑Based RAG Knowledge Base with LangChain, Streamlit, DashScope & DeepSeek
AI Algorithm Path
AI Algorithm Path
Aug 20, 2025 · Artificial Intelligence

DeepSeek V3.1 Open‑Source: Unlocking a New Era of Long‑Context AI

DeepSeek V3.1, a 685‑billion‑parameter open‑source model, supports up to 128,000 tokens, delivers mixed‑architecture capabilities, matches top‑tier closed systems in benchmarks, and its rapid community adoption signals a shift toward democratized AI development and new industry dynamics.

AI PerformanceDeepSeeklarge language model
0 likes · 6 min read
DeepSeek V3.1 Open‑Source: Unlocking a New Era of Long‑Context AI
Architects' Tech Alliance
Architects' Tech Alliance
Aug 13, 2025 · Artificial Intelligence

Can DeepSeek Survive the AI Arms Race? A Deep Dive into Its Challenges

DeepSeek, a fast‑rising large‑model contender, boasts impressive NLP and code‑generation capabilities, yet faces steep hurdles—including security concerns, industry‑specific customization gaps, slowing innovation, fierce competition from OpenAI, Google, and Alibaba’s Qwen3, and fragmented open‑source ecosystems—that cast doubt on its long‑term prospects.

AI competitionDeepSeekModel Evaluation
0 likes · 12 min read
Can DeepSeek Survive the AI Arms Race? A Deep Dive into Its Challenges
JD Tech Talk
JD Tech Talk
Aug 6, 2025 · Artificial Intelligence

How to Deploy JoyAgent AI Agent on JD Cloud in Four Simple Steps

This guide walks you through deploying JD Cloud’s open‑source JoyAgent AI agent using the JoyAgent‑Genie image, covering host creation, firewall configuration, model and search engine setup, and service startup, enabling you to access the Genie interface via a public IP.

DeepSeekJD CloudJoyAgent
0 likes · 4 min read
How to Deploy JoyAgent AI Agent on JD Cloud in Four Simple Steps
IT Services Circle
IT Services Circle
Jul 21, 2025 · Artificial Intelligence

Why Is DeepSeek’s R1 Losing Users? Inside the Market Shift and Strategy

DeepSeek’s R1, once hailed as a breakthrough AI model with explosive growth, now faces a sharp decline in user traffic and market share, prompting analysis of user migration to third‑party platforms, performance bottlenecks, and contrasting strategies with rivals like Anthropic.

AI modelAnthropicDeepSeek
0 likes · 8 min read
Why Is DeepSeek’s R1 Losing Users? Inside the Market Shift and Strategy
Tech Freedom Circle
Tech Freedom Circle
Jul 17, 2025 · Artificial Intelligence

DeepSeek V3 Architecture Deep Dive: MoE, MLA, DualPipe, FP8 Mixed Precision & Multi‑Token Prediction

This article provides a detailed technical analysis of DeepSeek‑V3, covering its MOE architecture, the novel Multi‑head Latent Attention (MLA) mechanism, the DualPipe pipeline‑parallel algorithm, mixed‑precision FP8 training, and the Multi‑Token Prediction (MTP) inference improvements that together boost performance and efficiency.

DeepSeekDistributed TrainingDualPipe
0 likes · 44 min read
DeepSeek V3 Architecture Deep Dive: MoE, MLA, DualPipe, FP8 Mixed Precision & Multi‑Token Prediction
Fun with Large Models
Fun with Large Models
Jul 17, 2025 · Artificial Intelligence

How to Integrate Large Models with LangChain: A Step‑by‑Step Tutorial

This tutorial explains LangChain's core modules and three‑layer architecture, shows how to set up a Python environment, and provides concrete code examples for connecting SiliconFlow Qwen3‑8B and DeepSeek models via the init_chat_model API, including result inspection and references to official documentation.

DeepSeekLangChainPython
0 likes · 9 min read
How to Integrate Large Models with LangChain: A Step‑by‑Step Tutorial
Tencent Technical Engineering
Tencent Technical Engineering
Jul 11, 2025 · Artificial Intelligence

How DeepSeek Achieved 15,800+ Tokens/s: Full‑Stack Inference Optimizations

This article details the Angel‑HCF team's end‑to‑end DeepSeek inference optimizations—including PD separation, multi‑layer MTP, EP and DP parallelism, hardware‑aware kernels, and load‑balancing strategies—that boost throughput to over 15,800 tokens per second while keeping per‑token latency under 50 ms.

AI PerformanceDeepSeekGPU utilization
0 likes · 13 min read
How DeepSeek Achieved 15,800+ Tokens/s: Full‑Stack Inference Optimizations
ITPUB
ITPUB
Jul 7, 2025 · Operations

How to Build a DeepSeek AI Ops Platform: Architecture & Implementation

This article presents a comprehensive blueprint for constructing a DeepSeek-powered AI Ops platform, detailing the six‑module architecture, data collection stack, AI engine deployment options, application and interaction layers, implementation road‑map, model training, security measures, cost estimates, and risk mitigation strategies.

AI OpsDeepSeekInfrastructure as Code
0 likes · 8 min read
How to Build a DeepSeek AI Ops Platform: Architecture & Implementation
DataFunTalk
DataFunTalk
Jul 6, 2025 · Artificial Intelligence

Why DeepSeek’s Low‑Cost Tokenomics Are Losing Market Share to Anthropic and OpenAI

The article analyses DeepSeek’s unconventional low‑price, high‑latency strategy, its token‑pricing and KPI trade‑offs, and compares its performance, hardware choices, and market share with Anthropic, OpenAI, Google and other AI providers, while also discussing the rise of inference‑as‑a‑service and rumors about DeepSeek R2.

AI modelsDeepSeekTokenomics
0 likes · 14 min read
Why DeepSeek’s Low‑Cost Tokenomics Are Losing Market Share to Anthropic and OpenAI
DataFunTalk
DataFunTalk
Jul 5, 2025 · Artificial Intelligence

DeepSeek R1T2 Chimera: Faster, High‑Performance LLM with Assembly of Experts

The DeepSeek R1T2 Chimera model, an open‑source LLM built with Assembly of Experts technology, delivers up to 200% faster inference than R1‑0528, surpasses R1 on GPQA‑Diamond and AIME‑24 benchmarks, and offers a 671‑billion‑parameter MoE architecture, though it lacks function‑calling support and trails the highest‑end R1‑0528 on the toughest tests.

AIAssembly of ExpertsDeepSeek
0 likes · 5 min read
DeepSeek R1T2 Chimera: Faster, High‑Performance LLM with Assembly of Experts
Java Architecture Diary
Java Architecture Diary
Jun 25, 2025 · Artificial Intelligence

Build a Text‑to‑SQL Chatbot with Spring AI and DeepSeek LLM

This tutorial walks through creating a natural‑language‑to‑SQL chatbot using Spring AI, configuring a MySQL school database with Flyway, defining system prompts for a DeepSeek LLM, implementing service beans and a REST API, and interacting with the bot via curl commands.

ChatbotDeepSeekJava
0 likes · 15 min read
Build a Text‑to‑SQL Chatbot with Spring AI and DeepSeek LLM
Sohu Tech Products
Sohu Tech Products
Jun 11, 2025 · Artificial Intelligence

How DeepSeek and TiDB AI Are Redefining Data Engines for the Large‑Model Era

This article explores DeepSeek's open‑source large‑model breakthroughs, PingCAP's AI‑enhanced database roadmap, TiDB.AI's retrieval‑augmented generation framework, the unified TiDB data engine, and practical Q&A insights on knowledge‑graph construction, vector search, and AI‑driven SQL generation.

AIDeepSeekKnowledge Graph
0 likes · 15 min read
How DeepSeek and TiDB AI Are Redefining Data Engines for the Large‑Model Era
DataFunTalk
DataFunTalk
Jun 11, 2025 · Backend Development

Master Modern Web Scraping: From Classic Tools to DeepSeek AI Integration

This article provides a comprehensive overview of web‑scraping technologies, compares popular tools such as requests, BeautifulSoup and Selenium, introduces AI‑assisted crawling with DeepSeek, and walks through practical steps for using BrightData’s platform to collect industry data, complete with ready‑to‑run Python code.

BrightDataDeepSeekPython
0 likes · 13 min read
Master Modern Web Scraping: From Classic Tools to DeepSeek AI Integration
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Jun 10, 2025 · Artificial Intelligence

DeepSeek Evolution: Technical Highlights, Architecture, and Performance Explained

This article examines DeepSeek’s various versions, detailing their core modules, underlying principles, architectural diagrams, and performance metrics, offering practical guidance for enthusiasts, professionals, and practitioners while inspiring further exploration of artificial intelligence innovations.

DeepSeekModel architectureTech Overview
0 likes · 2 min read
DeepSeek Evolution: Technical Highlights, Architecture, and Performance Explained
Java Architecture Diary
Java Architecture Diary
Jun 5, 2025 · Artificial Intelligence

Unlock AI Reasoning: How Ollama’s New ‘Thinking’ Feature Works

Version 0.9.0 of Ollama introduces a ‘thinking’ control that lets users view and manage the AI model’s reasoning process, with detailed CLI commands, REST API usage, model support list, scripting options, and advanced Modelfile configurations for models like DeepSeek R1 and Qwen 3.

AI reasoningCLIDeepSeek
0 likes · 6 min read
Unlock AI Reasoning: How Ollama’s New ‘Thinking’ Feature Works
Data Thinking Notes
Data Thinking Notes
Jun 4, 2025 · Artificial Intelligence

How DeepSeek AI Model is Revolutionizing China’s State Enterprises – Over 100 Deployment Cases

The DeepSeek large language model has been extensively deployed across more than 100 central and local Chinese state‑owned enterprises, spanning sectors such as energy, manufacturing, transportation, finance, telecommunications, construction, and public services, driving intelligent transformation through applications like smart scheduling, risk assessment, intelligent customer service, and AI‑enhanced office automation.

AI deploymentDeepSeekIndustrial AI
0 likes · 38 min read
How DeepSeek AI Model is Revolutionizing China’s State Enterprises – Over 100 Deployment Cases
Java Web Project
Java Web Project
Jun 4, 2025 · Artificial Intelligence

Why DeepSeek V3 Stands Out: Architecture, Performance, and Open‑Source Edge

The article analyzes DeepSeek's rapid adoption, detailing its seven core models, the third‑generation MoE architecture, FP8 mixed‑precision training, 128K context window, benchmark superiority on MMLU/HumanEval/CMMLU, low training cost, and fully open‑source release, while also introducing a companion guide for developers.

AI ArchitectureDeepSeekFP8 training
0 likes · 9 min read
Why DeepSeek V3 Stands Out: Architecture, Performance, and Open‑Source Edge
ITFLY8 Architecture Home
ITFLY8 Architecture Home
May 30, 2025 · Artificial Intelligence

How to Use AI (DeepSeek) for Efficient Official Document Drafting

This guide explains how DeepSeek can assist in quickly generating, structuring, and polishing official documents by defining clear requirements, building logical frameworks, filling content modules, adjusting style, and performing thorough proofreading, while also providing numerous prompt templates for common government and corporate paperwork.

AI writingDeepSeekGovernment
0 likes · 13 min read
How to Use AI (DeepSeek) for Efficient Official Document Drafting
Efficient Ops
Efficient Ops
May 29, 2025 · Artificial Intelligence

DeepSeek R1 0528 Update: New Features, Performance Gains Over OpenAI o3

DeepSeek quietly launched the R1 0528 model, which early testers report matches OpenAI’s o3 in benchmarks and style, while adding deeper chain‑of‑thought reasoning, better writing output, and extended thinking windows, and the announcement is followed by a promotion for the GOPS Global Ops Conference.

AI PerformanceDeepSeekModel Update
0 likes · 3 min read
DeepSeek R1 0528 Update: New Features, Performance Gains Over OpenAI o3
Alibaba Cloud Developer
Alibaba Cloud Developer
May 27, 2025 · Artificial Intelligence

How to Build AI-Powered Java Apps with Spring AI and DeepSeek

This guide walks Java developers through integrating Spring AI with large‑model services such as DeepSeek, covering setup, API key configuration, code examples for synchronous and streaming calls, reactive implementation, monitoring with Actuator, and compatibility with OpenAI‑style APIs.

AI integrationDeepSeekJava
0 likes · 9 min read
How to Build AI-Powered Java Apps with Spring AI and DeepSeek
Java One
Java One
May 26, 2025 · Artificial Intelligence

Integrate ProxyAI with JetBrains IDEA for Seamless AI‑Powered Coding

This guide walks you through installing the ProxyAI plugin in JetBrains IDEA, configuring default and DeepSeek‑V3 models, obtaining API keys, and using features like chat, FIM code completion, bug detection, and code explanation to boost development efficiency.

AI code assistantDeepSeekIDEA
0 likes · 8 min read
Integrate ProxyAI with JetBrains IDEA for Seamless AI‑Powered Coding
IT Services Circle
IT Services Circle
May 25, 2025 · Artificial Intelligence

DeepSeek Core Technologies and Model Innovations: DeepSeek‑V3 and DeepSeek‑R1 Technical Overview

The article provides a detailed technical overview of DeepSeek's flagship large language models, DeepSeek‑V3 and DeepSeek‑R1, describing their MoE architecture, training frameworks, reinforcement‑learning based fine‑tuning, inference optimizations, and the broader impact of these innovations on the AI landscape while also promoting related books and resources.

AIDeepSeekMixture of Experts
0 likes · 10 min read
DeepSeek Core Technologies and Model Innovations: DeepSeek‑V3 and DeepSeek‑R1 Technical Overview
Java Architecture Diary
Java Architecture Diary
May 21, 2025 · Artificial Intelligence

Spring AI 1.0 Launch: Production‑Ready Java AI Framework Unveiled

Spring AI 1.0, the first production‑grade Java AI framework, introduces ready‑to‑use APIs, seamless model integration, enterprise‑level RAG engine, smart tool calling, and three development modes, empowering developers to rapidly build, customize, and fully control AI applications with major model providers like OpenAI, Anthropic, DeepSeek.

AI FrameworkDeepSeekJava AI
0 likes · 13 min read
Spring AI 1.0 Launch: Production‑Ready Java AI Framework Unveiled
Architect's Guide
Architect's Guide
May 18, 2025 · Backend Development

Integrating DeepSeek AI with a WeChat Public Account: A Step‑by‑Step Backend Tutorial

This tutorial walks beginners through obtaining a DeepSeek API key, setting up an Alibaba Cloud ECS instance, configuring the server and WeChat public platform, installing required Python dependencies, editing configuration files, and finally running the chatbot so the public account can interact with the DeepSeek large‑language model.

APIBackendDeepSeek
0 likes · 12 min read
Integrating DeepSeek AI with a WeChat Public Account: A Step‑by‑Step Backend Tutorial
DataFunSummit
DataFunSummit
May 17, 2025 · Artificial Intelligence

Integrating Knowledge Graphs with DeepSeek AI for Enterprise Knowledge Management

This presentation explores how combining knowledge graphs with DeepSeek large‑model agents can revolutionize enterprise knowledge management, detailing DeepSeek’s technical strengths, the graph‑model complementarity paradigm, various knowledge types, practical frameworks, case studies, and future outlooks for AI‑enhanced intelligent systems.

DeepSeekEnterprise Knowledge Managementartificial intelligence
0 likes · 23 min read
Integrating Knowledge Graphs with DeepSeek AI for Enterprise Knowledge Management
Architects' Tech Alliance
Architects' Tech Alliance
May 16, 2025 · Industry Insights

Can DeepSeek Survive the AI Arms Race? A Deep Dive into Its Challenges and Competition

The article provides a comprehensive analysis of DeepSeek’s rise in the large‑model market, examining its technical merits, security and customization hurdles, slowing innovation, fierce competition from OpenAI, Google and Alibaba’s Qwen3, as well as the fragility of its open‑source ecosystem and data preparation, ultimately questioning its long‑term viability.

AI modelsDeepSeekIndustry analysis
0 likes · 13 min read
Can DeepSeek Survive the AI Arms Race? A Deep Dive into Its Challenges and Competition
Data Thinking Notes
Data Thinking Notes
May 13, 2025 · Information Security

DeepSeek Security: Top 5 Model Threats and How to Defend

This report examines DeepSeek’s security and reliability by detailing five core model threats—DDoS attacks, unlimited inference, vulnerability exploitation, data poisoning, and jailbreak—alongside two private‑deployment risks and three external threats such as counterfeit apps, offering targeted mitigation strategies to help users safely adopt the platform.

AI securityDeepSeekmodel safety
0 likes · 8 min read
DeepSeek Security: Top 5 Model Threats and How to Defend
Architect's Guide
Architect's Guide
May 13, 2025 · Artificial Intelligence

DeepSeek Model Distillation Technology: Overview, Innovations, Architecture, Training, Performance, and Challenges

This article provides a comprehensive overview of DeepSeek's model distillation technology, detailing its definition, key innovations, architecture, training methods, performance gains, and the remaining challenges such as the implicit performance ceiling and multimodal data distillation.

AI OptimizationDeepSeekKnowledge Transfer
0 likes · 14 min read
DeepSeek Model Distillation Technology: Overview, Innovations, Architecture, Training, Performance, and Challenges
Linux Kernel Journey
Linux Kernel Journey
May 8, 2025 · Artificial Intelligence

How Tencent’s TRMT Tech Delivered a Huge Speedup to DeepSeek’s Large‑Model Network

DeepSeek engineers highlighted Tencent’s open‑source TRMT and DeepEP contributions that boost GPU‑to‑GPU communication by up to 300%, double RoCE performance and add a further 30% gain on InfiniBand, while addressing lane‑utilization and CPU‑control bottlenecks through three targeted optimizations.

DeepEPDeepSeekGPU communication
0 likes · 6 min read
How Tencent’s TRMT Tech Delivered a Huge Speedup to DeepSeek’s Large‑Model Network
JD Tech
JD Tech
May 8, 2025 · Artificial Intelligence

The Emerging Boom of Large Model Applications and Why 2025 Will Be the Turning Point

Amid the AI wave, large language models like DeepSeek R1 are poised to explode by 2025, driven by open-source, low-cost access and superior reasoning, with successful deployment requiring four key factors—domain expertise, knowledge bases, robust search, and engineered agent architectures—to unlock value beyond simple chat.

2025AI applicationsAgent Architecture
0 likes · 10 min read
The Emerging Boom of Large Model Applications and Why 2025 Will Be the Turning Point
DevOps
DevOps
May 5, 2025 · Artificial Intelligence

DeepSeek Releases Math‑Specialized Large Model V2 and ProverBench Evaluation Suite

DeepSeek has quietly open‑sourced a new mathematics‑focused large language model, DeepSeek‑Prover‑V2 (available in 671B and 7B variants), achieving 88.9% on MiniF2F and strong results on PutnamBench, alongside the high‑quality ProverBench dataset and a novel recursive theorem‑proving pipeline.

AIDeepSeekMathematical Reasoning
0 likes · 4 min read
DeepSeek Releases Math‑Specialized Large Model V2 and ProverBench Evaluation Suite
ITPUB
ITPUB
May 5, 2025 · Operations

Turn Zabbix Alerts into an AI‑Powered Personal Assistant

This guide shows how to integrate Zabbix with a locally deployed DeepSeek large language model via Webhook, enabling automatic analysis of alert causes and solutions, feeding results back to operators through dashboards or enterprise WeChat, and dramatically reducing MTTR and manual effort.

AIAlert AutomationDeepSeek
0 likes · 5 min read
Turn Zabbix Alerts into an AI‑Powered Personal Assistant
Architects' Tech Alliance
Architects' Tech Alliance
May 2, 2025 · Artificial Intelligence

DeepSeek‑Prover‑V2‑671B: A Massive AI Model for Formal Mathematical Theorem Proving

DeepSeek‑Prover‑V2‑671B, a 671 billion‑parameter AI model released on Hugging Face, dramatically advances formal mathematical theorem proving with MoE architecture, FP8 quantization, 163 k token context, superior performance over GPT‑4 Turbo and other models, and broad implications for research and industry.

AIDeepSeekFP8 quantization
0 likes · 11 min read
DeepSeek‑Prover‑V2‑671B: A Massive AI Model for Formal Mathematical Theorem Proving
Data Thinking Notes
Data Thinking Notes
Apr 27, 2025 · Artificial Intelligence

Step‑by‑Step MCP Demo: Build Server and Claude/DeepSeek Clients

This guide walks developers through creating a complete MCP application, covering the workflow, server setup with Python, debugging tools, and client implementation using both Claude and DeepSeek models, complete with code snippets, environment configuration, and testing procedures to demonstrate end‑to‑end LLM tool integration.

ClaudeDeepSeekLLM
0 likes · 10 min read
Step‑by‑Step MCP Demo: Build Server and Claude/DeepSeek Clients
Baobao Algorithm Notes
Baobao Algorithm Notes
Apr 27, 2025 · Artificial Intelligence

How DeepSeek R1T‑Chimera Cuts Tokens by 40% Without Fine‑Tuning

The DeepSeek‑R1T‑Chimera model merges DeepSeek‑R1 reasoning with V3‑0324 architecture, reusing most V3 weights and swapping only the blue‑highlighted R1 routing experts, achieving the same intelligence as R1 while reducing output tokens by about 40% and running faster, all without any fine‑tuning or distillation.

DeepSeekLLMParameter Reuse
0 likes · 5 min read
How DeepSeek R1T‑Chimera Cuts Tokens by 40% Without Fine‑Tuning
Big Data Tech Team
Big Data Tech Team
Apr 23, 2025 · Industry Insights

10 Powerful Ways DeepSeek Transforms Data Warehousing

DeepSeek leverages AI to automate multi‑source integration, data cleaning, warehouse modeling, real‑time processing, governance, metadata management, reporting, cloud scaling, and decision support, offering twelve distinct use cases that boost efficiency, intelligence, and scalability of modern data warehouses.

AIData WarehouseDeepSeek
0 likes · 9 min read
10 Powerful Ways DeepSeek Transforms Data Warehousing
Data Thinking Notes
Data Thinking Notes
Apr 22, 2025 · Artificial Intelligence

How DeepSeek AI is Transforming Agriculture, Manufacturing, Finance, Healthcare, and Education

The Zhejiang University IT Center report highlights DeepSeek's AI technology across more than 40 real‑world cases in agriculture, manufacturing, finance, healthcare and education, demonstrating data‑driven, intelligent solutions that accelerate industry transformation and modernization.

AI applicationsData-driven transformationDeepSeek
0 likes · 3 min read
How DeepSeek AI is Transforming Agriculture, Manufacturing, Finance, Healthcare, and Education
Big Data Technology & Architecture
Big Data Technology & Architecture
Apr 22, 2025 · Artificial Intelligence

Introduction to Retrieval‑Augmented Generation (RAG) and Vector Indexing with StarRocks and DeepSeek

This article explains the fundamentals of Retrieval‑Augmented Generation, demonstrates how to create and query vector indexes using StarRocks, shows how DeepSeek provides embeddings and answer generation, and walks through a complete end‑to‑end RAG pipeline with code examples and a web UI.

AIDeepSeekEmbedding
0 likes · 20 min read
Introduction to Retrieval‑Augmented Generation (RAG) and Vector Indexing with StarRocks and DeepSeek
Big Data Tech Team
Big Data Tech Team
Apr 21, 2025 · Industry Insights

8 Practical Ways DeepSeek Boosts Data Quality for Better Governance

This guide outlines eight concrete methods DeepSeek uses to improve data quality—including automated cleaning, validation, classification, monitoring, governance standards, anomaly detection, integration, and intelligent analysis—providing actionable steps for organizations to enhance data accuracy, completeness, consistency, and usability.

Data IntegrationData QualityDeepSeek
0 likes · 5 min read
8 Practical Ways DeepSeek Boosts Data Quality for Better Governance
dbaplus Community
dbaplus Community
Apr 21, 2025 · Operations

Turn Zabbix Alerts into AI‑Powered Insights with DeepSeek

This guide shows how to integrate Zabbix with a locally deployed DeepSeek large language model via Webhook, enabling automatic analysis of alerts, generation of root‑cause explanations and remediation suggestions, and delivering results through WeChat bots, dashboards, or email to reduce MTTR and manual effort.

AI OpsAlert AutomationDeepSeek
0 likes · 4 min read
Turn Zabbix Alerts into AI‑Powered Insights with DeepSeek
AIWalker
AIWalker
Apr 17, 2025 · Artificial Intelligence

Unveiling DeepSeek’s Janus Series: Decoupled Visual Encoding for Unified Multimodal Understanding and Generation

This article provides an in‑depth analysis of DeepSeek’s Janus and Janus‑Pro models, explaining how decoupling visual encoding resolves the conflict between multimodal understanding and generation, detailing training stages, data scaling, architectural choices, and presenting extensive benchmark results that demonstrate significant performance gains.

BenchmarkDeepSeekJanus
0 likes · 23 min read
Unveiling DeepSeek’s Janus Series: Decoupled Visual Encoding for Unified Multimodal Understanding and Generation
Big Data Tech Team
Big Data Tech Team
Apr 14, 2025 · Industry Insights

How DeepSeek AI is Transforming Data Warehouses: From Automation to Real‑Time Insights

DeepSeek leverages large‑model AI to automate requirement analysis, intelligent modeling, performance tuning, and value extraction in data warehouses, addressing low development efficiency, high O&M cost, latency, and lack of intelligence while showcasing concrete use‑case results across finance, e‑commerce, and manufacturing.

AIAutomationData Warehouse
0 likes · 9 min read
How DeepSeek AI is Transforming Data Warehouses: From Automation to Real‑Time Insights
Open Source Linux
Open Source Linux
Apr 14, 2025 · Artificial Intelligence

How to Deploy DeepSeek Locally: Step‑by‑Step Guide for Offline AI

This guide compares DeepSeek’s local and online versions, outlines hardware and privacy advantages of offline deployment, and provides a detailed step‑by‑step tutorial—including Ollama installation, model selection, command execution, and UI plugin setup—to help users run DeepSeek on their own machines.

AI modelDeepSeekOllama
0 likes · 6 min read
How to Deploy DeepSeek Locally: Step‑by‑Step Guide for Offline AI