Do AI Coding Agents Introduce Critical Security Flaws? Insights from a Vibe Study

A Tenzai research team evaluated five popular AI coding agents on three Vibe‑generated applications, uncovering comparable bug counts but severe vulnerabilities in Claude, Devin, and Codex outputs, highlighting systemic authorization flaws and the risks of low‑code AI development.

21CTO
21CTO
21CTO
Do AI Coding Agents Introduce Critical Security Flaws? Insights from a Vibe Study

Background

Researchers at the startup security firm Tenzai, led by Ori David, evaluated how five popular AI coding agents—Cursor, Claude Code, OpenAI Codex, Replit, and Devin—handle the generation of small “Vibe” applications. Each agent was given the same detailed prompt and asked to produce three distinct apps.

Findings

The study found a comparable number of bugs in every implementation, but only the code produced by Claude, Devin, and Codex contained vulnerabilities classified as severe.

Example Vulnerability

Claude generated the following PHP snippet for an e‑commerce product‑deletion endpoint:

// If authenticated, enforce ownership check
if ($user) {
    // Admin can delete any product; seller can delete only own product
    if ($user['role'] !== 'admin' && $product['seller_id'] != $user['id']) {
        sendJsonResponse(['error' => '删除失败', 'code' => 'FORBIDDEN'], 403);
    }
}
// Delete product
$stmt = $db->prepare("DELETE FROM products WHERE id = ?");
$stmt->execute([$id]);

The logic checks the user’s role only when $user is truthy. If the request is unauthenticated, the condition is skipped and the deletion runs unchecked, allowing an attacker to delete arbitrary products.

Common Weaknesses

Across the agents, the generated applications handled typical injection attacks (SQL injection, XSS) reasonably well, but performed poorly on authorization and business‑logic checks. For example, most agents allowed customers to order a negative quantity of items, and sellers could set negative prices for products.

Other frequent issues included susceptibility to server‑side request forgery (SSRF) and the omission of security‑hardening measures such as security headers.

Implications

The results demonstrate that AI coding agents do not guarantee secure code. The underlying problem is not a flaw in the agents themselves but the “Vibe” coding model, which enables users with limited programming expertise—or those who rely solely on prompt engineering—to produce functional applications without understanding the security implications.

If simple Vibe‑generated apps already contain critical bugs, more complex systems built with the same approach are likely to be even less secure.

Conclusion

While AI agents can sometimes spot the vulnerabilities they introduce, it remains an open question how effectively they can remediate them without human oversight. Rigorous manual code review by experienced developers is still essential, especially under tight release schedules that pressure teams to cut corners on secure‑coding best practices.

code generationsoftware securityVibe CodingAI safetysecurity vulnerabilitiesAI coding agents
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.