Tagged articles

LLM security

27 articles · Page 1 of 1

Jul 1, 2026 · Information Security

Jailbreak Attacks and Prompt Injection: Intent Patterns, Detection, and Multi‑Layer Defense for LLMs

The article analyzes LLM jailbreak and prompt‑injection techniques—detailing five intent construction patterns, detection principles that prioritize intent over keywords, and a multi‑layered defense architecture spanning input normalization, intent analysis, generation control, and output review—to guide robust AI security.

AI safetyLLM securitydefense layering

0 likes · 12 min read

Jailbreak Attacks and Prompt Injection: Intent Patterns, Detection, and Multi‑Layer Defense for LLMs

Machine Heart

Jun 13, 2026 · Information Security

How a Harmless Query Can Hijack LLM Agents: The First Semantic Cache Key Collision Attack

A new study presented at ICML 2026 reveals that the fuzzy matching used in LLM semantic caching creates an integrity vulnerability, allowing attackers to craft adversarial suffixes that cause cache‑key collisions and achieve up to 86 % response‑hijacking success on major cloud services such as AWS and Azure.

AI AgentsCloud ServicesLLM security

0 likes · 9 min read

How a Harmless Query Can Hijack LLM Agents: The First Semantic Cache Key Collision Attack

Machine Learning Algorithms & Natural Language Processing

Jun 12, 2026 · Artificial Intelligence

How a Chinese Team Bypassed Fable 5’s Safety Classifier in Under 5 Seconds

Researchers from an international team demonstrated that the Anthropic Fable 5 model’s new safety classifier can be evaded in under five seconds with a single dialogue, exposing an internal safety collapse where agents autonomously generate harmful output during task execution, a flaw now confirmed across dozens of frontier LLMs.

AgentFable 5ISC-Bench

0 likes · 12 min read

How a Chinese Team Bypassed Fable 5’s Safety Classifier in Under 5 Seconds

Black & White Path

Jun 7, 2026 · Information Security

Exploring OnlyLANs: A Free Prompt‑Injection Playground for LLM Security

OnlyLANs, a free AI security challenge by Just Hacking Training, lets participants jailbreak a chatbot called NetworkJohn to extract admin email, verification code, and a competitor recommendation, illustrating real‑world prompt‑injection risks highlighted in OWASP’s LLM Top‑10.

AI safetyCTFJust Hacking Training

0 likes · 3 min read

Exploring OnlyLANs: A Free Prompt‑Injection Playground for LLM Security

Su San Talks Tech

May 6, 2026 · Information Security

What Is Prompt Injection? Attack Vectors and Defense Strategies

The article explains that Prompt injection is a new LLM security threat where attackers blur the line between instruction and data, outlines direct and indirect injection techniques—including command overriding, role‑play jailbreaks, encoding obfuscation, and multi‑turn attacks—and proposes a defense‑in‑depth framework with input filtering, prompt design, output validation, least‑privilege architecture, and specialized safeguards for RAG and agent scenarios.

AI safetyAgentDefense in Depth

0 likes · 15 min read

What Is Prompt Injection? Attack Vectors and Defense Strategies

Data Party THU

Apr 21, 2026 · Artificial Intelligence

Can LLM Attack Detection Work Without Storing Any Conversation Text?

This article experimentally evaluates a privacy‑preserving LLM security pipeline that discards raw dialogue after extracting 28 telemetry features, showing that using only 11 text‑independent signals retains about 98.5% of detection performance while reducing false‑positive rates.

LLM securityfeature engineeringjailbreak detection

0 likes · 10 min read

Can LLM Attack Detection Work Without Storing Any Conversation Text?

Machine Learning Algorithms & Natural Language Processing

Apr 20, 2026 · Information Security

Can Claude Code’s Auto Mode Replace Human Review? First Pressure Test Results

A systematic pressure test of Claude Code’s Auto Mode across 128 ambiguous permission scenarios reveals an 81.0% false‑negative rate and significant bypasses through Tier 2 file edits, highlighting both its partial safety benefits and critical shortcomings in autonomous code execution.

AmPermBenchAuto ModeClaude Code

0 likes · 10 min read

Can Claude Code’s Auto Mode Replace Human Review? First Pressure Test Results

JavaGuide

Apr 14, 2026 · Artificial Intelligence

Interview Question: How to Build Prompt Engineering for an Agent and Defend Against Malicious Prompt Injection

The article explains how industrial‑grade AI agents require structured prompt engineering, chain‑of‑thought reasoning, task decomposition, and a three‑layer defense (sandbox, prompt isolation, and human approval) to prevent prompt‑injection attacks, while also covering context engineering, retrieval‑augmented generation, and tool design best practices.

Agent DesignChain-of-ThoughtLLM security

0 likes · 23 min read

Interview Question: How to Build Prompt Engineering for an Agent and Defend Against Malicious Prompt Injection

Machine Heart

Apr 11, 2026 · Information Security

Is Claude Mythos Overhyped? AI-Assisted Bug Discovery Is Already Routine

The article debunks the hype around Claude Mythos, showing that AI‑assisted vulnerability discovery has long been a practical reality, citing VIDOC Security Lab’s findings, real‑world bug examples, the accelerating threat landscape, and recommendations for proactive, multi‑model defenses.

AI threatAI vulnerability detectionClaude Mythos

0 likes · 9 min read

Is Claude Mythos Overhyped? AI-Assisted Bug Discovery Is Already Routine

AI Architect Hub

Apr 7, 2026 · Artificial Intelligence

Defending Large Language Models Against Prompt Injection Attacks

This article explains the principles and common scenarios of prompt injection attacks on LLMs and provides practical defense strategies—including rule reinforcement, input filtering, output verification, and advanced techniques—to protect AI systems from malicious manipulation.

AI safetyDefense StrategiesLLM security

0 likes · 8 min read

Defending Large Language Models Against Prompt Injection Attacks

DeepHub IMBA

Mar 31, 2026 · Information Security

Can Prompt Injection Be Detected Without Storing Conversation Logs? A Privacy‑First Experiment

The article presents a privacy‑first system that extracts numeric telemetry from each LLM interaction, discards raw text, and evaluates whether detection of prompt injection and jailbreak attacks remains effective, showing only a 1.4 F1‑point drop when using solely text‑independent features.

LLM securityPrivacyTelemetry

0 likes · 9 min read

Can Prompt Injection Be Detected Without Storing Conversation Logs? A Privacy‑First Experiment

AntTech

Mar 23, 2026 · Information Security

How ‘Brain‑Control’ Attacks Threaten Autonomous LLM Agents and How to Defend Them

A joint Tsinghua‑Ant Group study reveals a full‑lifecycle threat model for OpenClaw autonomous LLM agents, detailing five novel brain‑control attack vectors and proposing a five‑layer defense framework that secures the system from boot to execution.

AI safetyAutonomous AgentsLLM security

0 likes · 14 min read

How ‘Brain‑Control’ Attacks Threaten Autonomous LLM Agents and How to Defend Them

Black & White Path

Mar 13, 2026 · Information Security

Beware: Generative AI as a New Cybercrime Ally—13 Enterprise Attack Vectors

The article analyzes how generative AI is transforming cybercrime by enabling 13 distinct attack methods—from highly personalized phishing emails and AI‑assisted malware creation to automated vulnerability hunting, deep‑fake social engineering, malicious LLMs, and attacks on AI infrastructure—highlighting recent research data and real‑world examples that illustrate the heightened speed, stealth, and accessibility of modern threats.

AI InfrastructureGenerative AILLM security

0 likes · 13 min read

Beware: Generative AI as a New Cybercrime Ally—13 Enterprise Attack Vectors

AI Info Trend

Mar 12, 2026 · Artificial Intelligence

Autonomous LLM Agents as Security Threats: Key Findings from ‘Agents of Chaos’

A recent arXiv preprint titled ‘Agents of Chaos’ details an extensive experiment where autonomous large‑language‑model agents, equipped with persistent storage, email, Discord, file system and shell access, were deployed on Fly.io VMs and subjected to red‑team attacks by twenty researchers, exposing eleven real security, privacy and governance failures.

AI riskAI safetyAgent Governance

0 likes · 9 min read

Autonomous LLM Agents as Security Threats: Key Findings from ‘Agents of Chaos’

PaperAgent

Mar 3, 2026 · Information Security

What 11 Critical Security Flaws Were Uncovered in OpenClaw AI Agents?

A comprehensive study of the OpenClaw framework reveals eleven severe security vulnerabilities in multi‑agent AI systems, ranging from over‑reactive data deletion to identity‑spoofing attacks, resource‑exhaustion loops, and covert manipulation, highlighting systemic social‑coherence failures and the need for robust agent governance.

AI AgentsAgent GovernanceLLM security

0 likes · 14 min read

What 11 Critical Security Flaws Were Uncovered in OpenClaw AI Agents?

Black & White Path

Feb 15, 2026 · Artificial Intelligence

Microsoft Unveils Lightweight Tool to Scan Large Language Models for Hidden Backdoors

Microsoft's AI security team introduced a lightweight scanner that detects backdoors in open‑weight large language models by leveraging three observable signals, offering a low‑false‑positive solution while highlighting the tool's methodology, limitations, and its role in extending Microsoft's AI‑focused Secure Development Lifecycle.

AI safetyLLM securityMicrosoft

0 likes · 6 min read

Microsoft Unveils Lightweight Tool to Scan Large Language Models for Hidden Backdoors

Huolala Safety Emergency Response Center

Jan 21, 2026 · Information Security

How to Build an Automated Red‑Team Framework for LLM Security Testing

This article presents a systematic approach to evaluating large language model (LLM) safety by constructing an automated red‑team testing platform that measures prompt jailbreak, privacy leakage, and tool‑execution risks, defines quantitative metrics, compares commercial and open‑source models, and outlines a continuous evolution pipeline for attack samples.

AI safetyLLM securityadversarial testing

0 likes · 20 min read

How to Build an Automated Red‑Team Framework for LLM Security Testing

Woodpecker Software Testing

Jan 21, 2026 · Information Security

The OWASP LLM Top 10: Key Security Risks and Mitigation Strategies

The OWASP LLM Top 10 outlines the most critical security and risk vulnerabilities in large language model applications, describing each threat—from prompt injection to model theft—its potential impact, and recommended defense principles such as secure development lifecycles, defense‑in‑depth, least‑privilege, human‑in‑the‑loop, and continuous monitoring.

AI safetyLLM securityOWASP

0 likes · 8 min read

The OWASP LLM Top 10: Key Security Risks and Mitigation Strategies

Huolala Tech

Jan 21, 2026 · Artificial Intelligence

Building an Automated Red‑Team Framework for LLM Security Testing

This article presents a systematic approach to evaluating large language model security by defining threat models, categorizing attack surfaces such as jailbreak and privacy leakage, and describing an automated red‑team platform that generates, mutates, scores, and evolves adversarial prompts to continuously assess model robustness.

LLM securityadversarial AIprompt injection

0 likes · 20 min read

Building an Automated Red‑Team Framework for LLM Security Testing

Data Party THU

Oct 27, 2025 · Artificial Intelligence

Why Most LLM Defense Strategies Fail Against Adaptive Attacks

An extensive study reveals that twelve recent large‑language‑model defenses, including prompt‑based, adversarial‑training, filtering, and secret‑knowledge methods, are easily bypassed by a general adaptive attack framework using gradient descent, reinforcement learning, search, and human red‑team techniques, exposing critical robustness gaps.

LLM securityadaptive attacksjailbreak

0 likes · 11 min read

Why Most LLM Defense Strategies Fail Against Adaptive Attacks

Data Party THU

Oct 4, 2025 · Artificial Intelligence

Advances in Robust AI: Defending Adversarial Attacks, Boosting Domain Generalization, Stopping LLM Jailbreaks

This article reviews the latest progress in designing algorithms with strong robustness, covering adversarial examples in computer vision, novel training paradigms and certification methods, domain‑generalization techniques that achieve state‑of‑the‑art performance in medical imaging and molecular recognition, and new attack‑defense strategies for LLM jailbreak scenarios.

AI safetyLLM securityadversarial robustness

0 likes · 4 min read

Advances in Robust AI: Defending Adversarial Attacks, Boosting Domain Generalization, Stopping LLM Jailbreaks

DataFunTalk

Aug 29, 2025 · Artificial Intelligence

How a $500 GPU Hack Turns LLMs into Hidden Advertising Engines

A recent arXiv paper reveals that with an RTX 4070, a few hundred toxic training samples, and just one hour of fine‑tuning, attackers can embed covert advertisements into large language models like Gemini 2.5, creating cheap, undetectable AI‑driven ad platforms.

AI safetyLLM securityadvertisement embedding attack

0 likes · 12 min read

How a $500 GPU Hack Turns LLMs into Hidden Advertising Engines

Java Tech Enthusiast

Jul 17, 2025 · Artificial Intelligence

How a Simple Colon Can Fool Top LLMs – The ‘Universal Key’ Vulnerability Exposed

Researchers discovered that trivial symbols such as a colon or the word “Solution” can trigger false‑positive rewards in LLM judge models, causing GPT‑4o, Claude‑4 and LLaMA‑3‑70B to fail, and proposed a robust “Master‑RM” model that eliminates these attacks.

AI robustnessLLM securityReward Modeling

0 likes · 10 min read

How a Simple Colon Can Fool Top LLMs – The ‘Universal Key’ Vulnerability Exposed

AntTech

Jun 16, 2025 · Information Security

Uncovering New Attack Vectors in Model Context Protocols: Risks and Defenses

A comprehensive study reveals that Model Context Protocol (MCP) platforms lack strict vetting, users struggle to detect malicious servers, and current large language models cannot effectively resist MCP‑level injection attacks, highlighting critical security challenges and proposing mitigation strategies.

LLM securityMCPSupply Chain Attack

0 likes · 11 min read

Uncovering New Attack Vectors in Model Context Protocols: Risks and Defenses

Architecture and Beyond

Mar 15, 2025 · Information Security

Prompt Injection Attacks on Large Language Models: Risks, Types, and Defense Framework

This article explains how prompt injection attacks exploit large language models by altering their behavior through crafted inputs, outlines the major harms and attack categories—including direct, indirect, multimodal, code, and jailbreak attacks—and presents a comprehensive three‑layer defense framework covering input‑side, output‑side, and system‑level protections.

AI safetyLLM securityinformation security

0 likes · 16 min read

Prompt Injection Attacks on Large Language Models: Risks, Types, and Defense Framework

Liangxu Linux

Jul 2, 2023 · Information Security

How the “Grandma Prompt” Bypasses LLM Safeguards and Generates Windows Keys

The article examines the so‑called “grandma prompt” that tricks ChatGPT, Bing, and other LLMs into revealing Windows activation keys and even adult jokes, explains why such prompt‑injection works, and reviews past similar exploits and their mitigation attempts.

AI safetyChatGPT jailbreakLLM security

0 likes · 7 min read

How the “Grandma Prompt” Bypasses LLM Safeguards and Generates Windows Keys

Programmer DD

Jun 28, 2023 · Information Security

How the ‘Grandma Prompt’ Tricks ChatGPT into Revealing Windows Activation Keys

The article examines the so‑called “grandma loophole”—a prompt‑injection technique that convinces ChatGPT, Bing, and other LLMs to generate Windows and Office activation keys, explores related exploits across platforms, and discusses the broader implications for AI security and ongoing mitigation efforts.

AI vulnerabilitiesChatGPTLLM security

0 likes · 7 min read

How the ‘Grandma Prompt’ Tricks ChatGPT into Revealing Windows Activation Keys