Artificial Intelligence 3 min read

New March 2026 Paper Exposes Fraudulent Third‑Party APIs for Large Language Models

A recent arXiv study audited 17 popular shadow APIs used in 187 papers, finding up to a 47.21% performance gap versus official models—e.g., Gemini‑2.5‑flash’s accuracy drops from 83.82% to about 37% on MedQA—highlighting serious reliability and safety risks of unofficial LLM services.

DeepHub IMBA

Mar 6, 2026

New March 2026 Paper Exposes Fraudulent Third‑Party APIs for Large Language Models

A newly posted arXiv paper (arXiv:2603.01919v1) investigates the rapidly growing ecosystem of “shadow APIs” that promise cheaper or unrestricted access to cutting‑edge large language models such as GPT‑5 and Gemini‑2.5, bypassing pricing, payment, or regional restrictions.

The authors compiled a list of 17 shadow‑API services that have been referenced in 187 academic publications. They evaluated each service along three dimensions—Utility, Safety, and Model Verification. One of the most widely used projects on GitHub has nearly 60 000 stars and over 5 900 citations, underscoring the prevalence of these services in research.

Experimental results reveal a substantial performance discrepancy between official APIs and their shadow counterparts, with a maximum deviation of 47.21%. For instance, on the high‑risk medical benchmark MedQA, the official Gemini‑2.5‑flash model achieves 83.82% accuracy, whereas the tested shadow API’s accuracy collapses to roughly 37.00%.

The paper includes detailed tables and charts (see the figure below) that document these gaps; the article’s author chooses not to reproduce the full data but encourages readers to consult the original study.

These findings raise serious concerns about the reliability of downstream applications that depend on shadow APIs and threaten the reproducibility of scientific research that assumes parity with official models. The author also criticizes naïve responses that ignore these risks, noting that the lack of trustworthy third‑party services hampers broader development.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

large language models performance evaluation AI safety model verification shadow APIs

Written by

DeepHub IMBA

A must‑follow public account sharing practical AI insights. Follow now. internet + machine learning + big data + architecture = IMBA

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.