Artificial Intelligence 9 min read

DeepSeek V3.1 Review: 128K Context, Knowledge, Programming & Agent Skills Near Claude 4

DeepSeek V3.1, released on August 19, expands context length to 128 K tokens and updates its knowledge base to July 2024, and the author’s benchmarks show its programming and agent capabilities now rival Claude 4, with detailed prompt examples, code generation demos, and performance comparisons.

Fun with Large Models

Aug 20, 2025

DeepSeek V3.1 Review: 128K Context, Knowledge, Programming & Agent Skills Near Claude 4

DeepSeek released V3.1 on August 19, marking the biggest update since January with a 128K token context window and a knowledge base refreshed to July 2024, positioned as a strong response to GPT‑5.

The model is the default when the “deep thinking” option is unchecked on DeepSeek’s web UI; the OpenWebUI API also points to V3.1. Simple dialogue tests confirm the knowledge base is current only to July 2024.

Beyond longer context, the author evaluated programming and agent abilities. Benchmarks show performance comparable to Claude 4.1 Opus in most dimensions, with some gaps in physical reasoning and code robustness.

Programming evaluation

The author reused classic test cases (ball‑rolling, particle‑vortex, audio generator, coffee landing page, etc.) and added new prompts such as a solar‑system simulator. Six dimensions were measured: code accuracy, intent recognition, front‑end rendering, physics adherence, code quality, and willingness to generate long code.

Results: Compared with Claude 4.1 Opus, DeepSeek V3.1 matches it in accuracy and front‑end rendering, but lags in physics fidelity and robustness. Sample outputs for the solar‑system simulator and coffee landing page illustrate clearer, more modern designs from DeepSeek.

Please generate a standalone HTML file that includes an interactive solar‑system simulator.

All code must be contained within the <html> file; do not reference external libraries or files.

The simulator should have:
- Dark background representing space.
- Central glowing yellow sphere as the Sun.
- Eight planets (Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, Neptune) with distinct colors and relative sizes; Saturn should have rings.
- Elliptical or circular orbits for each planet, roughly concentric.
- Title “Solar System Simulation” centered at the top.

Animation:
- Planets orbit the Sun at speeds proportional to their real‑world orbital velocities (inner planets faster).
- Initial positions set on load; optionally paused or running at default speed.

Controls:
- “Start”, “Pause”, “Reset” buttons.
- “Speed” slider ranging from 1× to 5×, displaying the current multiplier.

Technical notes:
- Use HTML for structure, CSS for layout and styling, and JavaScript (preferably with a <canvas> element) for the animation and controls.

Agent performance

Agent tests referenced a B‑site reviewer who integrated DeepSeek V3.1 into a custom knowledge‑base retrieval system. The author reports that intent recognition, keyword extraction, agent call accuracy, and long‑document generation are on par with GPT‑5 and Claude 4.1 Opus.

Images (shown above) display side‑by‑side comparisons of generated code and UI, highlighting DeepSeek’s strengths and occasional weaknesses in more complex scenarios such as audio generation or very long code synthesis.

Conclusion: DeepSeek V3.1 re‑establishes the company’s leading position in the large‑model field, demonstrating that steady technical accumulation can compete with top‑tier models.

DeepSeek Large Language Model Context Length Agent Evaluation Claude 4 Programming Evaluation

Written by

Fun with Large Models

Master's graduate from Beijing Institute of Technology, published four top‑journal papers, previously worked as a developer at ByteDance and Alibaba. Currently researching large models at a major state‑owned enterprise. Committed to sharing concise, practical AI large‑model development experience, believing that AI large models will become as essential as PCs in the future. Let's start experimenting now!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.