DeepSeek‑V4‑Lite‑285B Hits 100% Recall in 256K Token Tests – A Needle‑in‑a‑Haystack Benchmark

Community testing of DeepSeek's rumored V4‑Lite‑285B model using the OpenAI MRCR 8‑pin standard shows perfect 1.0000 scores on several 128K‑token samples and a 256K‑token sample, achieving 100% recall in native 256K context while longer contexts drop to about 60%, with a note that the "needle‑in‑a‑haystack" method may be exploitable by DSA mechanisms.

AI Engineering
AI Engineering
AI Engineering
DeepSeek‑V4‑Lite‑285B Hits 100% Recall in 256K Token Tests – A Needle‑in‑a‑Haystack Benchmark

Recent community reports indicate that DeepSeek is testing a new model, likely the DeepSeek‑V4‑Lite‑285B, evaluated with the OpenAI MRCR 8‑pin benchmark to assess information retrieval performance in ultra‑long contexts.

In the 128K‑token test, multiple samples achieved outstanding scores:

Index 32: 132,828 tokens, score 1.0000 ✅

Index 45: 135,258 tokens, score 1.0000 ✅

Index 78: 130,660 tokens, score 0.9821 🆗

For the 256K‑token test, a sample also reached perfect performance:

Index 97: 253,819 tokens, score 1.0000 ✅

The model name in the test was a placeholder, likely to avoid premature disclosure of the actual version.

Test result screenshot
Test result screenshot

The forum‑shared results indicate that the model attains a 100% recall rate in a native 256K‑token context, but the recall stabilizes around 60% when the context length exceeds this range.

Web application screenshot
Web application screenshot

A user warned that the "needle‑in‑a‑haystack" testing method might be gamed by DSA (a possible special mechanism), which can directly recognize specific markers used in the test pattern, suggesting that the impressive scores may not fully reflect real‑world performance.

Although the model has not yet been updated on HuggingFace, the community remains highly interested in DeepSeek's progress on long‑context handling, as achieving a million‑token effective window would represent a significant breakthrough for open‑source models.

The above information is unverified and provided for reference only.

LLMDeepSeeklong contextrecall ratetoken benchmark
AI Engineering
Written by

AI Engineering

Focused on cutting‑edge product and technology information and practical experience sharing in the AI field (large models, MLOps/LLMOps, AI application development, AI infrastructure).

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.