DeepSeek V4 Pro vs GPT‑5.3 Codex High: Direct Code‑Generation Test Reveals the Gap
A two‑stage evaluation compares DeepSeek V4 Pro and GPT‑5.3 Codex High on a TypeScript LRU‑Cache task and a markdown‑inspection CLI project, showing DeepSeek leads on basic code correctness while GPT‑5.3 delivers a more complete engineering solution, with detailed scores and analysis.
Test Design
The evaluation uses a two‑layer approach. The first layer tests basic TypeScript coding ability with an LRU‑Cache implementation. The second layer tests end‑to‑end agent engineering ability by building a small CLI called md‑inspector that recursively scans Markdown files and produces a quality report.
Round 1 – LRU Cache
Requirements:
O(1) get / put operations
Configurable capacity, including handling capacity = 0 Full source code
Five test cases covering edge conditions
Scoring after three interaction rounds:
DeepSeek V4 Pro : first attempt 8.2 → final 9.0
GPT‑5.3 Codex High : first attempt 7.8 → final 8.6
DeepSeek V4 Pro initial solution used a Map plus a doubly‑linked list, then added:
Generic type parameters
Constructor validation for non‑negative integer capacity
Additional API methods size, has, clear Vitest test suite covering boundary cases
Explicit complexity notes
Separated link‑node and data‑node types to avoid the unsafe cast null as unknown as K API semantics: get returns V | undefined; tryGet returns an object with found: true/false to distinguish cache‑miss from a cached undefined GPT‑5.3 Codex High also started with a canonical Map + linked‑list design and then upgraded:
Replaced circular sentinel design
Removed unsafe null as unknown cast
Added tests for illegal capacities (NaN, Infinity, negative numbers, floating‑point values)
Structured hit result as {hit, value} Weaknesses noted for GPT‑5.3 were fewer regression tests, less detailed design rationale, and slightly weaker engineering narration.
Round 2 – Markdown CLI ( md‑inspector )
Task: implement a TypeScript CLI that recursively scans a directory of Markdown files and outputs a quality report. Real‑world edge cases required handling:
Empty directories and non‑existent directories
Missing or multiple H1 headings
Image links that should not be counted as normal links
Links inside fenced code blocks that should be ignored
Cross‑platform path handling (Windows vs macOS/Linux)
File‑read failures that must produce warnings instead of crashing
Constraints:
Use only Node built‑in modules
Reasonable file splitting
At least eight Vitest tests
Clear execution and verification instructions
Self‑review step at the end
Scoring:
GPT‑5.3 Codex High : 8.7 (rank 1) – described as the most mature code agent
DeepSeek V4 Pro : 8.0 (rank 2) – usable initial project but less stable in finalization
GPT‑5.3 strengths :
Explicitly stated requirement assumptions and implementation plan (requirements → initialization → module decomposition → error handling → testing → self‑review)
Modular project structure (scanner, analyzer, path handling, report generator, entry point)
Test coverage exceeded the minimum with ten Vitest tests
All npm test and npx tsc --noEmit checks passed
CLI error semantics matched the specification: missing directories produce JSON‑formatted warnings rather than crashes
GPT‑5.3 weaknesses :
Markdown parsing based on regular expressions instead of an AST
Custom word‑count assumptions without external justification
Limited cross‑platform failure testing
Coarse error handling during the scanning phase
DeepSeek V4 Pro weaknesses :
TypeScript compilation failed ( npx tsc --noEmit) due to missing @types/node, causing type‑resolution errors for node:fs/promises and process Error semantics used stderr + exit instead of the required JSON warning format
Insufficient tolerance for scanning‑phase failures
End‑to‑end CLI behavior tests were limited, focusing more on internal modules than on actual command‑line execution
Combined Findings
Basic code generation and first‑answer correctness: DeepSeek V4 Pro performed better.
Engineering closure, test completeness, and delivery stability: GPT‑5.3 Codex High outperformed.
Overall ranking: GPT‑5.3 Codex High > DeepSeek V4 Pro, with the gap attributable to engineering finish quality rather than a fundamental capability gap.
Claude Code Context
Claude Code is a code‑agent capable of reading and writing project files, executing commands, running tests/builds, iteratively fixing issues, and maintaining goal consistency across multi‑step task chains. This capability explains why single‑answer correctness is insufficient; stable multi‑step progress determines real development experience.
Top Architecture Tech Stack
Sharing Java and Python tech insights, with occasional practical development tool tips.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
