Tagged articles
5 articles
Page 1 of 1
Data Party THU
Data Party THU
Apr 21, 2026 · Artificial Intelligence

Can LLM Attack Detection Work Without Storing Any Conversation Text?

This article experimentally evaluates a privacy‑preserving LLM security pipeline that discards raw dialogue after extracting 28 telemetry features, showing that using only 11 text‑independent signals retains about 98.5% of detection performance while reducing false‑positive rates.

LLM Securityfeature engineeringjailbreak detection
0 likes · 10 min read
Can LLM Attack Detection Work Without Storing Any Conversation Text?
DeepHub IMBA
DeepHub IMBA
Mar 31, 2026 · Information Security

Can Prompt Injection Be Detected Without Storing Conversation Logs? A Privacy‑First Experiment

The article presents a privacy‑first system that extracts numeric telemetry from each LLM interaction, discards raw text, and evaluates whether detection of prompt injection and jailbreak attacks remains effective, showing only a 1.4 F1‑point drop when using solely text‑independent features.

LLM Securitybehavioral featuresjailbreak detection
0 likes · 9 min read
Can Prompt Injection Be Detected Without Storing Conversation Logs? A Privacy‑First Experiment
Data Party THU
Data Party THU
Aug 11, 2025 · Artificial Intelligence

Can Hidden Signals Reveal Multimodal Model Jailbreaks? Introducing HiddenDetect

This article presents HiddenDetect, a training‑free method that leverages refusal‑semantic vectors and layer‑wise activation analysis to detect jailbreak attempts in multimodal large language models, revealing distinct safety signals across text and image modalities and demonstrating strong performance on several LVLM benchmarks.

LVLMMultimodalactivation analysis
0 likes · 7 min read
Can Hidden Signals Reveal Multimodal Model Jailbreaks? Introducing HiddenDetect
AI Frontier Lectures
AI Frontier Lectures
Jul 27, 2025 · Information Security

Can Hidden Activations Expose Multimodal Model Jailbreaks?

The paper reveals that large multimodal language models retain refusal signals in their hidden states even after jailbreak attempts, and proposes a training‑free detection method that leverages these signals to identify unsafe inputs across text and image modalities with strong generalization.

AI SafetyLVLM securityhidden activation analysis
0 likes · 7 min read
Can Hidden Activations Expose Multimodal Model Jailbreaks?