Tagged articles

jailbreak detection

5 articles · Page 1 of 1

Apr 21, 2026 · Artificial Intelligence

Can LLM Attack Detection Work Without Storing Any Conversation Text?

This article experimentally evaluates a privacy‑preserving LLM security pipeline that discards raw dialogue after extracting 28 telemetry features, showing that using only 11 text‑independent signals retains about 98.5% of detection performance while reducing false‑positive rates.

LLM securityfeature engineeringjailbreak detection

0 likes · 10 min read

Can LLM Attack Detection Work Without Storing Any Conversation Text?

DeepHub IMBA

Mar 31, 2026 · Information Security

Can Prompt Injection Be Detected Without Storing Conversation Logs? A Privacy‑First Experiment

The article presents a privacy‑first system that extracts numeric telemetry from each LLM interaction, discards raw text, and evaluates whether detection of prompt injection and jailbreak attacks remains effective, showing only a 1.4 F1‑point drop when using solely text‑independent features.

LLM securityPrivacyTelemetry

0 likes · 9 min read

Can Prompt Injection Be Detected Without Storing Conversation Logs? A Privacy‑First Experiment

Data Party THU

Aug 11, 2025 · Artificial Intelligence

Can Hidden Signals Reveal Multimodal Model Jailbreaks? Introducing HiddenDetect

This article presents HiddenDetect, a training‑free method that leverages refusal‑semantic vectors and layer‑wise activation analysis to detect jailbreak attempts in multimodal large language models, revealing distinct safety signals across text and image modalities and demonstrating strong performance on several LVLM benchmarks.

LVLMLarge Language ModelsMultimodal

0 likes · 7 min read

Can Hidden Signals Reveal Multimodal Model Jailbreaks? Introducing HiddenDetect

AI Frontier Lectures

Jul 27, 2025 · Information Security

Can Hidden Activations Expose Multimodal Model Jailbreaks?

The paper reveals that large multimodal language models retain refusal signals in their hidden states even after jailbreak attempts, and proposes a training‑free detection method that leverages these signals to identify unsafe inputs across text and image modalities with strong generalization.

AI safetyLVLM securityhidden activation analysis

0 likes · 7 min read

Can Hidden Activations Expose Multimodal Model Jailbreaks?

58 Tech

Jan 8, 2020 · Information Security

iOS Security Hardening: Code Obfuscation, Anti‑Debugging, Signature Verification, and Anomaly Detection

This article explains the principles and practical implementations of iOS security hardening techniques—including code obfuscation, anti‑debugging, signature verification, and abnormal data detection—illustrated with real‑world examples from the 58.com iOS client.

Anti-debuggingcode-obfuscationiOS

0 likes · 16 min read

iOS Security Hardening: Code Obfuscation, Anti‑Debugging, Signature Verification, and Anomaly Detection