API‑Only Probes Reveal GPT, Claude, Gemini Parameter Counts – Community Buzz

A new arXiv paper introduces Incompressible Knowledge Probes that estimate large language model sizes via black‑box API calls, fitting a log‑linear relation on 89 open‑source models and producing controversial parameter estimates for GPT‑5.5, Claude Opus, Gemini and others, sparking heated community debate.

Machine Heart
Machine Heart
Machine Heart
API‑Only Probes Reveal GPT, Claude, Gemini Parameter Counts – Community Buzz

Researchers led by Li Bojie published an arXiv paper titled Incompressible Knowledge Probes: Estimating Black‑Box LLM Parameter Counts via Factual Capacity , proposing a framework that infers the parameter scale of any LLM solely through black‑box API queries.

The idea originated from a three‑year informal test where the team repeatedly asked various LLMs a niche question about the "Zhejiang University Hackergame" CTF competition. After DeepSeek‑V4’s release, the team spent four days building a formal IKP dataset containing 1,400 questions divided into seven scarcity levels, and evaluated 188 models from 27 vendors.

The core hypothesis is that while logical reasoning can be compressed or distilled, memorization of obscure factual knowledge cannot be significantly reduced and mainly depends on the model’s physical parameter count.

Using 89 open‑source models with known parameters (ranging from 1.35 × 10⁸ to 1.6 × 10¹²), the authors fitted a logarithmic‑linear relationship between factual accuracy and parameter count, achieving R² = 0.917. This relationship was then applied to closed‑source models to estimate their sizes.

Estimated parameter counts (90 % confidence interval ≈ 0.3–3×) include:

GPT‑5.5: ~9 trillion parameters

Claude Opus 4.7: ~4 trillion parameters

GPT‑5.4: ~2.2 trillion parameters

Claude Sonnet 4.6: ~1.7 trillion parameters

Gemini 2.5 Pro: ~1.2 trillion parameters

The paper also reports two additional observations: (1) citation count and h‑index do not reliably predict whether a researcher is remembered by models; models preferentially retain works with domain impact. (2) Across three years, the time coefficient of factual memory for 96 open‑source models is statistically near zero, contradicting the previously suggested “Densing Law” and implying that reasoning capacity may be saturating while factual capacity remains tied to parameter scale.

The community’s reaction has been mixed. Some users argue that a 9 trillion‑parameter GPT‑5.5 would exceed OpenAI’s current infrastructure capabilities and that performance gains do not match a ten‑fold parameter increase, suggesting a more plausible 2× scaling. Others note that synthetic‑data fine‑tuning can boost factual recall, potentially undermining the “facts are incompressible” premise. Discussions also highlight differences between Mixture‑of‑Experts (MoE) and dense architectures, proposing separate analyses for each.

Despite the controversy, many contributors offer constructive suggestions, such as separating MoE and dense models in future studies to better observe scaling trends.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

LLMAI scalingparameter estimationClaude OpusGPT-5.5black-box evaluationIncompressible Knowledge Probes
Machine Heart
Written by

Machine Heart

Professional AI media and industry service platform

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.