Free LLM API Tokens: Complete Provider List, Limits, and Usage Tips
This guide compiles free large‑language‑model APIs from official vendors and third‑party platforms, detailing each service's token quotas, rate limits, base URLs, usage restrictions, and available models, while offering practical advice on token optimization, multi‑platform rotation, rate‑limit handling, and key security.
Free LLM APIs – Official Provider Quotas
Cohere
Application URL: dashboard.cohere.com/api-keys Free quota: 1,000 API calls per month
Rate limit: 20 RPM
Base URL: https://api.cohere.com/v2 Usage restriction: non‑commercial only
Available models: Command A (111B, 256K context, 4K output, text), Command R+, Command R, Command R7B, Embed 4 (text + image), Rerank 3.5 (10 RPM)
Google Gemini
Application URL: aistudio.google.com/app/apikey Rate limits: Flash 10 RPM / 250 RPD, Flash‑Lite 15 RPM / 1,000 RPD
Base URL: https://generativelanguage.googleapis.com/v1beta Usage note: free‑tier prompts may be used by Google to improve its products
Available models: Gemini 2.5 Flash and Flash‑Lite (1M context, 65K output, text + image + audio + video)
Mistral AI
Application URL: console.mistral.ai/api-keys (register for the Experiment plan)
Free quota: ~1 billion tokens per month
Rate limit: ~1 RPS · 500 K TPM per model
Base URL: https://api.mistral.ai/v1 Available models (all free): Mistral Small 4 (256K context, 256K output, text + image + code), Mistral Medium 3 (128K/128K, text), Mistral Large 3 (256K/256K, text), Mistral Nemo 12B (128K/128K, text), Codestral (256K/256K, code‑only), Pixtral Large (128K/128K, text + image)
智谱 AI(Z AI)
Application URL: open.bigmodel.cn/usercenter/apikeys Rate limit: 1 concurrent request per model
Base URL: https://open.bigmodel.cn/api/paas/v4 Free models: GLM‑4.7‑Flash (200K context, 128K output, text), GLM‑4.5‑Flash (128K context, ~8K output, text), GLM‑4.6V‑Flash (128K context, ~4K output, text + image)
Free LLM APIs – Third‑Party Inference Platforms
Cerebras
Application URL: cloud.cerebras.ai Daily token limit: 1 M tokens (shared)
Rate limit: 30 RPM · 14,400 RPD · 1 M TPD
Base URL: https://api.cerebras.ai/v1 Free models: llama3.1‑8b, gpt‑oss‑120b, qwen‑3‑235b‑a22b‑instruct‑2507, zai‑glm‑4.7 (all 8K context on free tier)
Groq
Application URL: console.groq.com/keys Rate limit: 30 RPM · 14,400 RPD
Base URL: https://api.groq.com/openai/v1 Free models include: llama‑3.3‑70B‑versatile (131K context, 32K output), llama‑3.1‑8b‑instant (131K/131K), llama‑4‑scout‑17b (131K/8K, text + vision), llama‑4‑maverick‑17b (131K/8K, 15 RPM · 500 RPD), qwen3‑32b, gpt‑oss‑120b, kimi‑k2‑instruct (262K/262K), deepseek‑r1‑distill‑70b (131K/8K, inference), whisper‑large‑v3/turbo (audio → text, 20 RPM)
GitHub Models
Application URL: github.com/marketplace/models Single‑request limit: 8K input / 4K output
Base URL: https://models.inference.ai.azure.com Selected free models (45+): gpt‑4.1, gpt‑4.1‑mini, o4‑mini, o3‑mini, gpt‑4o, Llama‑4‑Scout‑17B‑16E, Llama‑4‑Maverick‑17B‑128E, DeepSeek‑R1, Meta‑Llama‑3.3‑70B, Mistral‑Small‑3.1, plus 35 more (text / image)
OpenRouter
Application URL: openrouter.ai/keys (add ":free" suffix to model name for free tier)
Rate limit: 20 RPM · 200 RPD shared across all free models
Base URL: https://openrouter.ai/api/v1 Selected free models: deepseek‑r1‑0528, deepseek‑chat‑v3‑0324, qwen‑3.6‑plus, qwen‑3‑coder‑480b‑a35b, meta‑llama‑4‑scout, meta‑llama‑4‑maverick, openai‑gpt‑oss‑120b, nvidia‑nemotron‑3‑super‑120b, google‑gemma‑4‑31b‑it, mistralai‑devstral‑2512, minimax‑minimax‑m2.5, plus ~23 more (see openrouter.ai/models?q=:free)
NVIDIA NIM
Application URL: build.nvidia.com/explore/discover (join NVIDIA Developer Program for free access)
Rate limit: ~40 RPM, no daily token cap
Base URL: https://integrate.api.nvidia.com/v1 Selected free models (100+): deepseek‑ai/deepseek‑r1 (128K / ~163K), nvidia/nemotron‑3‑super‑120b‑a12b (262K / 262K), nvidia/llama‑3.1‑nemotron‑ultra‑253b (128K / 4K), meta/llama‑3.1‑405b‑instruct (128K / 4K), qwen/qwen2.5‑72b‑instruct (128K / 8K), minimax/minimax‑m2.7 (128K / 8K), nvidia/nemotron‑nano‑2‑vl (128K / 8K, vision + text + video), plus ~90 more (text, image, video, audio, embeddings)
SiliconFlow
Application URL: cloud.siliconflow.cn/account/ak (register for 14 CNY credit)
Free‑tier rate: 1,000 RPM · 50 K TPM
Base URL: https://api.siliconflow.cn/v1 Free models: Qwen‑3‑8B, DeepSeek‑R1‑0528‑Qwen‑3‑8B, DeepSeek‑R1‑Distill‑Qwen‑7B, THUDM/glm‑4‑9b‑chat, THUDM/GLM‑4.1V‑9B‑Thinking (text, vision, inference), DeepSeek‑OCR (visual OCR)
How to Maximize Free Tokens
Assign Models by Task Type
Lightweight daily tasks (code completion, formatting): use 智谱 GLM‑4.7‑Flash / GLM‑4.5‑Flash (permanent free)
Medium‑complex tasks (debugging, refactoring, doc generation): use GLM Coding Plan Lite / Qwen‑3.6‑Plus (free on OpenRouter)
Very long document processing: Kimi API (unlimited tokens) or Gemini (1 M context)
Complex multi‑step agent workflows: GLM‑5.1 or MiniMax M2.7
Top‑tier inference needs: Claude Opus 4.6/4.7 (pay‑as‑you‑go)
Build a Multi‑Platform Rotation
Do not funnel all traffic through a single provider. Use OpenRouter as a unified entry point and switch between GLM‑5.1, Qwen‑3.6‑Plus, MiniMax M2.5, etc. When a platform hits its rate limit, automatically fall back to the next one.
Handle Rate‑Limit Restrictions
Almost every free API enforces RPM and RPD limits. Implement exponential back‑off retry logic in code: on receiving a 429 error, wait, then retry instead of crashing.
Secure API Keys
Leaked keys allow others to consume your quota or incur charges. Never hard‑code keys in source files or push them to public repositories; store them in environment variables or a dedicated secret‑management system.
Key Takeaways
For independent developers and students, combining Mistral (≈1 B tokens / month), Groq (14,400 RPD), and GitHub Models (GPT‑4.1 / o4‑mini) enables a completely zero‑cost early‑stage AI product validation. Free quotas are suitable for development, testing, and learning, but should not be relied upon for production because they lack SLA guarantees, priority queuing, and have strict rate limits.
Data source: GitHub search, repository mnfst/awesome-free-llm-apis , CC0 license, continuously updated.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
AI Engineer Programming
In the AI era, defining problems is often more important than solving them; here we explore AI's contradictions, boundaries, and possibilities.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
