Why AI Coding Agents Miss the Mark—and How to Make Them Work
The article analyzes the hype around AI coding tools like OpenClaw, exposing false demands, the pitfalls of building agents before real needs, the quality gaps in AI‑generated code, and practical strategies such as spec‑first coding, bottleneck identification, and multi‑model orchestration to improve productivity.
False demand vs. real need
Community observations (e.g., @hq4ai on X) show that AI‑driven summarisation and code generation are often pseudo‑needs. Users spend time configuring agents or providing extensive context, yet the net productivity gain is unclear.
Tool‑first, demand‑later trap
Examples such as the OpenClaw craze and a three‑layer Agent architecture (Command → Agent → Skill) required 15 minutes of setup to modify a one‑second code snippet. Large language models (LLMs) already act as excellent interpreters; adding extra intent‑recognition or routing layers yields little value.
Quality gaps in AI‑generated code
Common issues include:
Longer debugging because developers must understand the model’s reasoning.
Severe performance regressions, e.g., an LLM‑rewritten SQLite Rust version was 20,171× slower due to a missing line in the query planner.
Passing tests does not guarantee correct behaviour, performance, or handling of edge cases.
Spec‑first coding
Adopt a "Spec Coding" workflow: let the LLM draft a specification (what to do and how), have engineers review the spec, then generate code. The specification becomes the high‑level artifact that is reviewed, treating the LLM as a compiler rather than a source of truth.
Identifying the real bottleneck
Applying Goldratt’s Theory of Constraints shows that the review stage is the true bottleneck. Running multiple Claude Code sessions in parallel speeds up generation but overwhelms the reviewer, leading to duplicated logic and missed optimisations.
Cognitive traps
Anchoring on the first AI solution and failing to consider alternatives.
Illusory speed: developers feel 20 % faster while objective measurements show a 19 % slowdown (METR study).
Over‑confidence of AI agents, which can mask underlying misunderstandings.
Mitigations: request 2‑3 alternative solutions with pros/cons, pre‑define decision points before prompting, and explicitly ask the model for failure modes.
Multi‑model division of labour
Different LLMs excel at different tasks. For example, Opus is strong at requirement analysis and code generation, while Codex is better at code review. Using Opus to write logic and then handing it to Codex for review uncovers issues that neither model catches alone.
Adaptive skills
For an AI tool to remain useful it must understand a developer’s workflow, accumulate context, and suggest relevant skills automatically. A daily‑report skill that remembers project context dramatically reduces friction.
Practical takeaways
Identify the real bottleneck (often the review stage) before adding AI tools.
Avoid anchoring by asking for multiple solutions and explicitly questioning edge cases.
Leverage specialised models for their strengths rather than a single "all‑in‑one" agent.
Continuously evaluate whether a tool saves time; if it is unused for a week, reconsider its value.
Key examples and metrics
SQLite Rust rewrite case study: LLM‑generated code compiled and passed all tests but exhibited a 20,171‑fold slowdown because a single line ( is_ipk) was omitted from the query planner.
METR developer study (2025): 16 experienced developers using AI‑assisted coding were on average 19 % slower, despite perceiving a 20 % speedup.
Agent performance limits: In UI, network, and concurrency‑heavy scenarios, current agents perform no better than traditional tooling.
Reference URLs
https://x.com/karpathy/status/1886192184808149383
https://x.com/hq4ai/status/2028047870985633961
https://blog.katanaquant.com/p/your-llm-doesnt-write-correct-code
https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/
https://cloud.tencent.com/developer/article/2631822
Output = f(source, output_format)Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
