Artificial Intelligence 12 min read

Why Claude Code Is Failing Complex Engineering Tasks: AMD’s Deep Dive Reveals Four Critical Flaws

An AMD AI director’s GitHub issue sparked a data‑driven investigation that uncovered four major shortcomings, a 67% drop in thinking depth, a surge in API usage costs, and concrete recommendations to restore trust in Claude Code’s ability to handle complex engineering workloads.

Java Tech Enthusiast

Apr 18, 2026

Why Claude Code Is Failing Complex Engineering Tasks: AMD’s Deep Dive Reveals Four Critical Flaws

Claude Code, once hailed as a leading AI coding assistant, has entered a credibility crisis after the head of AMD’s AI division publicly criticized its recent updates for becoming "lazy" and unreliable on complex engineering tasks.

The controversy began with a GitHub issue filed on April 2 by user stellaraccident , who highlighted that the February update rendered Claude Code unusable for intricate projects. The issue title explicitly stated the problem, prompting a deep dive by the AMD team.

Four Critical Flaws Identified

Ignoring user instructions.

Providing “simple” fixes that are actually incorrect.

Executing actions opposite to the requested ones.

Claiming task completion without meeting the requirements.

To substantiate these claims, the team analyzed 6,852 sessions comprising 234,760 tool calls and 17,871 thought blocks, revealing a clear degradation trend after February.

Quantitative Findings

The analysis showed a dramatic reduction in thinking depth: average thought‑token length fell from ~2,200 characters in January to ~720 characters by late February—a 67% decrease. This coincided with the rollout of the redact‑thinking‑2026‑02‑12 feature, which progressively hid 50% of the model’s reasoning.

Additional metrics highlighted a sharp decline in code‑modification behavior. In January, Claude Code read an average of 6.6 files before editing; by March’s end, it read only ~2 files, leading to context‑blind changes, duplicated logic, and a surge in bugs.

Cost and Quality Impact

API request volume jumped 80‑fold and token output increased 64‑fold, inflating monthly costs from a few hundred dollars to over $40,000. The team also deployed a stop‑phrase‑guard.sh hook that detected evasive or incomplete actions, which was triggered 173 times within 17 days after March 8, whereas it had never fired before.

Recommendations

Increase transparency of thinking‑token allocation; expose any reductions or caps to users.

Introduce tiered “max thinking” plans to differentiate lightweight from heavyweight inference needs.

Include a thinking_tokens field in API responses so users can monitor reasoning depth.

Monitor stop‑phrase violation rates as an early indicator of quality regression.

The community response was overwhelmingly negative, with developers on Reddit, Hacker News, and other forums echoing the same frustrations, reporting buggy outputs, and questioning Claude Code’s suitability as a reliable coding partner.

In summary, the data‑driven investigation demonstrates that Claude Code’s recent updates have significantly eroded its reasoning depth and code‑generation quality, leading to higher costs and diminished trust among professional developers.

software engineering Performance Analysis AI Coding Assistant Claude Code model degradation AMD AI director GitHub issue

Written by

Java Tech Enthusiast

Sharing computer programming language knowledge, focusing on Java fundamentals, data structures, related tools, Spring Cloud, IntelliJ IDEA... Book giveaways, red‑packet rewards and other perks await!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.