API‑Only Probes Reveal GPT, Claude, Gemini Parameter Counts – Community Buzz
A new arXiv paper introduces Incompressible Knowledge Probes that estimate large language model sizes via black‑box API calls, fitting a log‑linear relation on 89 open‑source models and producing controversial parameter estimates for GPT‑5.5, Claude Opus, Gemini and others, sparking heated community debate.
Researchers led by Li Bojie published an arXiv paper titled Incompressible Knowledge Probes: Estimating Black‑Box LLM Parameter Counts via Factual Capacity , proposing a framework that infers the parameter scale of any LLM solely through black‑box API queries.
The idea originated from a three‑year informal test where the team repeatedly asked various LLMs a niche question about the "Zhejiang University Hackergame" CTF competition. After DeepSeek‑V4’s release, the team spent four days building a formal IKP dataset containing 1,400 questions divided into seven scarcity levels, and evaluated 188 models from 27 vendors.
The core hypothesis is that while logical reasoning can be compressed or distilled, memorization of obscure factual knowledge cannot be significantly reduced and mainly depends on the model’s physical parameter count.
Using 89 open‑source models with known parameters (ranging from 1.35 × 10⁸ to 1.6 × 10¹²), the authors fitted a logarithmic‑linear relationship between factual accuracy and parameter count, achieving R² = 0.917. This relationship was then applied to closed‑source models to estimate their sizes.
Estimated parameter counts (90 % confidence interval ≈ 0.3–3×) include:
GPT‑5.5: ~9 trillion parameters
Claude Opus 4.7: ~4 trillion parameters
GPT‑5.4: ~2.2 trillion parameters
Claude Sonnet 4.6: ~1.7 trillion parameters
Gemini 2.5 Pro: ~1.2 trillion parameters
The paper also reports two additional observations: (1) citation count and h‑index do not reliably predict whether a researcher is remembered by models; models preferentially retain works with domain impact. (2) Across three years, the time coefficient of factual memory for 96 open‑source models is statistically near zero, contradicting the previously suggested “Densing Law” and implying that reasoning capacity may be saturating while factual capacity remains tied to parameter scale.
The community’s reaction has been mixed. Some users argue that a 9 trillion‑parameter GPT‑5.5 would exceed OpenAI’s current infrastructure capabilities and that performance gains do not match a ten‑fold parameter increase, suggesting a more plausible 2× scaling. Others note that synthetic‑data fine‑tuning can boost factual recall, potentially undermining the “facts are incompressible” premise. Discussions also highlight differences between Mixture‑of‑Experts (MoE) and dense architectures, proposing separate analyses for each.
Despite the controversy, many contributors offer constructive suggestions, such as separating MoE and dense models in future studies to better observe scaling trends.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
