Baobao Algorithm Notes
May 13, 2024 · Artificial Intelligence
How to Detect Test Set Leakage in Black‑Box Language Models
The ICLR 2024 paper introduces a black‑box method for detecting test‑set leakage in large language models by comparing log‑probabilities of original and shuffled test orders, proposes a scalable sharded likelihood test, and demonstrates its effectiveness on several open‑source models, revealing a potential leak in Mistral‑7B.
LLM evaluationlanguage model securityshuffled likelihood test
0 likes · 7 min read
