New March 2026 Paper Exposes Fraudulent Third‑Party APIs for Large Language Models

A recent arXiv study audited 17 popular shadow APIs used in 187 papers, finding up to a 47.21% performance gap versus official models—e.g., Gemini‑2.5‑flash’s accuracy drops from 83.82% to about 37% on MedQA—highlighting serious reliability and safety risks of unofficial LLM services.

AI safetylarge language modelsmodel verification

0 likes · 3 min read

New March 2026 Paper Exposes Fraudulent Third‑Party APIs for Large Language Models

DeepHub IMBA

Mar 6, 2026 · Artificial Intelligence

Shadow APIs vs Official LLMs: Up to 47% Performance Gap Revealed in New Study

A recent arXiv paper audits 17 widely used shadow APIs, showing that their outputs can deviate from official large language model APIs by as much as 47.21%, with accuracy on the MedQA benchmark dropping from 83.82% to around 37%, raising serious reliability concerns.

AI safetylarge language modelsmodel verification

0 likes · 3 min read

Shadow APIs vs Official LLMs: Up to 47% Performance Gap Revealed in New Study