Tagged articles
23 articles
Page 1 of 1
Su San Talks Tech
Su San Talks Tech
May 6, 2026 · Information Security

What Is Prompt Injection? Attack Vectors and Defense Strategies

The article explains that Prompt injection is a new LLM security threat where attackers blur the line between instruction and data, outlines direct and indirect injection techniques—including command overriding, role‑play jailbreaks, encoding obfuscation, and multi‑turn attacks—and proposes a defense‑in‑depth framework with input filtering, prompt design, output validation, least‑privilege architecture, and specialized safeguards for RAG and agent scenarios.

AI SafetyAgentDefense in Depth
0 likes · 15 min read
What Is Prompt Injection? Attack Vectors and Defense Strategies
Data Party THU
Data Party THU
Apr 21, 2026 · Artificial Intelligence

Can LLM Attack Detection Work Without Storing Any Conversation Text?

This article experimentally evaluates a privacy‑preserving LLM security pipeline that discards raw dialogue after extracting 28 telemetry features, showing that using only 11 text‑independent signals retains about 98.5% of detection performance while reducing false‑positive rates.

LLM Securityfeature engineeringjailbreak detection
0 likes · 10 min read
Can LLM Attack Detection Work Without Storing Any Conversation Text?

Can Claude Code’s Auto Mode Replace Human Review? First Pressure Test Results

A systematic pressure test of Claude Code’s Auto Mode across 128 ambiguous permission scenarios reveals an 81.0% false‑negative rate and significant bypasses through Tier 2 file edits, highlighting both its partial safety benefits and critical shortcomings in autonomous code execution.

AmPermBenchClaude CodeFalse negative rate
0 likes · 10 min read
Can Claude Code’s Auto Mode Replace Human Review? First Pressure Test Results
JavaGuide
JavaGuide
Apr 14, 2026 · Artificial Intelligence

Interview Question: How to Build Prompt Engineering for an Agent and Defend Against Malicious Prompt Injection

The article explains how industrial‑grade AI agents require structured prompt engineering, chain‑of‑thought reasoning, task decomposition, and a three‑layer defense (sandbox, prompt isolation, and human approval) to prevent prompt‑injection attacks, while also covering context engineering, retrieval‑augmented generation, and tool design best practices.

Agent DesignContext EngineeringLLM Security
0 likes · 23 min read
Interview Question: How to Build Prompt Engineering for an Agent and Defend Against Malicious Prompt Injection
Machine Heart
Machine Heart
Apr 11, 2026 · Information Security

Is Claude Mythos Overhyped? AI-Assisted Bug Discovery Is Already Routine

The article debunks the hype around Claude Mythos, showing that AI‑assisted vulnerability discovery has long been a practical reality, citing VIDOC Security Lab’s findings, real‑world bug examples, the accelerating threat landscape, and recommendations for proactive, multi‑model defenses.

AI threatAI vulnerability detectionClaude Mythos
0 likes · 9 min read
Is Claude Mythos Overhyped? AI-Assisted Bug Discovery Is Already Routine
AI Architect Hub
AI Architect Hub
Apr 7, 2026 · Artificial Intelligence

Defending Large Language Models Against Prompt Injection Attacks

This article explains the principles and common scenarios of prompt injection attacks on LLMs and provides practical defense strategies—including rule reinforcement, input filtering, output verification, and advanced techniques—to protect AI systems from malicious manipulation.

AI SafetyDefense StrategiesLLM Security
0 likes · 8 min read
Defending Large Language Models Against Prompt Injection Attacks
DeepHub IMBA
DeepHub IMBA
Mar 31, 2026 · Information Security

Can Prompt Injection Be Detected Without Storing Conversation Logs? A Privacy‑First Experiment

The article presents a privacy‑first system that extracts numeric telemetry from each LLM interaction, discards raw text, and evaluates whether detection of prompt injection and jailbreak attacks remains effective, showing only a 1.4 F1‑point drop when using solely text‑independent features.

LLM Securitybehavioral featuresjailbreak detection
0 likes · 9 min read
Can Prompt Injection Be Detected Without Storing Conversation Logs? A Privacy‑First Experiment
Black & White Path
Black & White Path
Mar 13, 2026 · Information Security

Beware: Generative AI as a New Cybercrime Ally—13 Enterprise Attack Vectors

The article analyzes how generative AI is transforming cybercrime by enabling 13 distinct attack methods—from highly personalized phishing emails and AI‑assisted malware creation to automated vulnerability hunting, deep‑fake social engineering, malicious LLMs, and attacks on AI infrastructure—highlighting recent research data and real‑world examples that illustrate the heightened speed, stealth, and accessibility of modern threats.

AI InfrastructureLLM Securitycybercrime
0 likes · 13 min read
Beware: Generative AI as a New Cybercrime Ally—13 Enterprise Attack Vectors
AI Info Trend
AI Info Trend
Mar 12, 2026 · Artificial Intelligence

Autonomous LLM Agents as Security Threats: Key Findings from ‘Agents of Chaos’

A recent arXiv preprint titled ‘Agents of Chaos’ details an extensive experiment where autonomous large‑language‑model agents, equipped with persistent storage, email, Discord, file system and shell access, were deployed on Fly.io VMs and subjected to red‑team attacks by twenty researchers, exposing eleven real security, privacy and governance failures.

AI SafetyAI riskAutonomous Agents
0 likes · 9 min read
Autonomous LLM Agents as Security Threats: Key Findings from ‘Agents of Chaos’
PaperAgent
PaperAgent
Mar 3, 2026 · Information Security

What 11 Critical Security Flaws Were Uncovered in OpenClaw AI Agents?

A comprehensive study of the OpenClaw framework reveals eleven severe security vulnerabilities in multi‑agent AI systems, ranging from over‑reactive data deletion to identity‑spoofing attacks, resource‑exhaustion loops, and covert manipulation, highlighting systemic social‑coherence failures and the need for robust agent governance.

AI agentsLLM SecurityOpenClaw
0 likes · 14 min read
What 11 Critical Security Flaws Were Uncovered in OpenClaw AI Agents?
Black & White Path
Black & White Path
Feb 15, 2026 · Artificial Intelligence

Microsoft Unveils Lightweight Tool to Scan Large Language Models for Hidden Backdoors

Microsoft's AI security team introduced a lightweight scanner that detects backdoors in open‑weight large language models by leveraging three observable signals, offering a low‑false‑positive solution while highlighting the tool's methodology, limitations, and its role in extending Microsoft's AI‑focused Secure Development Lifecycle.

AI SafetyLLM SecurityMicrosoft
0 likes · 6 min read
Microsoft Unveils Lightweight Tool to Scan Large Language Models for Hidden Backdoors
Huolala Safety Emergency Response Center
Huolala Safety Emergency Response Center
Jan 21, 2026 · Information Security

How to Build an Automated Red‑Team Framework for LLM Security Testing

This article presents a systematic approach to evaluating large language model (LLM) safety by constructing an automated red‑team testing platform that measures prompt jailbreak, privacy leakage, and tool‑execution risks, defines quantitative metrics, compares commercial and open‑source models, and outlines a continuous evolution pipeline for attack samples.

AI SafetyAutomated TestingLLM Security
0 likes · 20 min read
How to Build an Automated Red‑Team Framework for LLM Security Testing
Woodpecker Software Testing
Woodpecker Software Testing
Jan 21, 2026 · Information Security

The OWASP LLM Top 10: Key Security Risks and Mitigation Strategies

The OWASP LLM Top 10 outlines the most critical security and risk vulnerabilities in large language model applications, describing each threat—from prompt injection to model theft—its potential impact, and recommended defense principles such as secure development lifecycles, defense‑in‑depth, least‑privilege, human‑in‑the‑loop, and continuous monitoring.

AI SafetyLLM SecurityOWASP
0 likes · 8 min read
The OWASP LLM Top 10: Key Security Risks and Mitigation Strategies
Huolala Tech
Huolala Tech
Jan 21, 2026 · Artificial Intelligence

Building an Automated Red‑Team Framework for LLM Security Testing

This article presents a systematic approach to evaluating large language model security by defining threat models, categorizing attack surfaces such as jailbreak and privacy leakage, and describing an automated red‑team platform that generates, mutates, scores, and evolves adversarial prompts to continuously assess model robustness.

LLM SecurityRed Teamadversarial AI
0 likes · 20 min read
Building an Automated Red‑Team Framework for LLM Security Testing
Data Party THU
Data Party THU
Oct 27, 2025 · Artificial Intelligence

Why Most LLM Defense Strategies Fail Against Adaptive Attacks

An extensive study reveals that twelve recent large‑language‑model defenses, including prompt‑based, adversarial‑training, filtering, and secret‑knowledge methods, are easily bypassed by a general adaptive attack framework using gradient descent, reinforcement learning, search, and human red‑team techniques, exposing critical robustness gaps.

LLM Securityadaptive attacksjailbreak
0 likes · 11 min read
Why Most LLM Defense Strategies Fail Against Adaptive Attacks
Data Party THU
Data Party THU
Oct 4, 2025 · Artificial Intelligence

Advances in Robust AI: Defending Adversarial Attacks, Boosting Domain Generalization, Stopping LLM Jailbreaks

This article reviews the latest progress in designing algorithms with strong robustness, covering adversarial examples in computer vision, novel training paradigms and certification methods, domain‑generalization techniques that achieve state‑of‑the‑art performance in medical imaging and molecular recognition, and new attack‑defense strategies for LLM jailbreak scenarios.

AI SafetyLLM Securityadversarial robustness
0 likes · 4 min read
Advances in Robust AI: Defending Adversarial Attacks, Boosting Domain Generalization, Stopping LLM Jailbreaks
DataFunTalk
DataFunTalk
Aug 29, 2025 · Artificial Intelligence

How a $500 GPU Hack Turns LLMs into Hidden Advertising Engines

A recent arXiv paper reveals that with an RTX 4070, a few hundred toxic training samples, and just one hour of fine‑tuning, attackers can embed covert advertisements into large language models like Gemini 2.5, creating cheap, undetectable AI‑driven ad platforms.

AI SafetyLLM Securityadvertisement embedding attack
0 likes · 12 min read
How a $500 GPU Hack Turns LLMs into Hidden Advertising Engines
AntTech
AntTech
Jun 16, 2025 · Information Security

Uncovering New Attack Vectors in Model Context Protocols: Risks and Defenses

A comprehensive study reveals that Model Context Protocol (MCP) platforms lack strict vetting, users struggle to detect malicious servers, and current large language models cannot effectively resist MCP‑level injection attacks, highlighting critical security challenges and proposing mitigation strategies.

LLM SecurityMCPinformation security
0 likes · 11 min read
Uncovering New Attack Vectors in Model Context Protocols: Risks and Defenses
Architecture and Beyond
Architecture and Beyond
Mar 15, 2025 · Information Security

Prompt Injection Attacks on Large Language Models: Risks, Types, and Defense Framework

This article explains how prompt injection attacks exploit large language models by altering their behavior through crafted inputs, outlines the major harms and attack categories—including direct, indirect, multimodal, code, and jailbreak attacks—and presents a comprehensive three‑layer defense framework covering input‑side, output‑side, and system‑level protections.

AI SafetyLLM Securityinformation security
0 likes · 16 min read
Prompt Injection Attacks on Large Language Models: Risks, Types, and Defense Framework
Programmer DD
Programmer DD
Jun 28, 2023 · Information Security

How the ‘Grandma Prompt’ Tricks ChatGPT into Revealing Windows Activation Keys

The article examines the so‑called “grandma loophole”—a prompt‑injection technique that convinces ChatGPT, Bing, and other LLMs to generate Windows and Office activation keys, explores related exploits across platforms, and discusses the broader implications for AI security and ongoing mitigation efforts.

AI vulnerabilitiesChatGPTLLM Security
0 likes · 7 min read
How the ‘Grandma Prompt’ Tricks ChatGPT into Revealing Windows Activation Keys