LLM Application Development Tips (1): How to Choose the Right Model
With a growing array of overseas and domestic LLM APIs in 2024, this guide explains how to pick the right model—starting with a top‑tier option like GPT‑4o for feasibility testing, then moving to cost‑effective or Chinese alternatives, while weighing price, inference speed, context window, API compatibility, and rate limits.
Problem Statement
In 2024 the market offers many LLM APIs from overseas (OpenAI GPT series, Anthropic Claude, Google Gemini) and numerous domestic providers, plus open‑source models for private deployment. Developers need guidance on which model to adopt for a new AI‑driven project.
Initiation Phase
During project initiation the author recommends using a top‑tier model to accurately gauge the upper bound of LLM capabilities and assess feasibility, risk, and potential ROI. Benchmarks such as AlignBench, MMLU, GSM8K, MATH, BBH, HumanEval can help identify top models. At the time of writing, OpenAI GPT‑4o (released May 2024) is presented as the default choice because it delivers twice the inference speed and half the price of GPT‑4 while maintaining leading benchmark scores.
For individual developers who face access barriers to OpenAI’s API, the author mentions alternatives like the API2D aggregation platform or the GitHub Models platform.
Deployment Phase
When moving to production, cost‑effectiveness becomes critical. The author suggests keeping the top‑tier model for proof‑of‑concept, then switching to a second‑tier model that offers a better price‑performance ratio, using refined system prompts to approach the performance of the original model.
Domestic Models
Chinese LLMs have rapidly closed the gap with overseas leaders, and in some scenarios they already outperform them. For products that must be deployed within China, domestic models are the preferred option.
Other Decision Factors
Price
LLM APIs are typically priced per token, with some providers charging the same rate for input and output tokens and others differentiating (output often more expensive). Developers must calculate expected token usage to compare costs, balancing price against required performance.
Inference Speed
Latency directly impacts user experience in conversational applications and also reflects the provider’s hardware capacity and operational robustness.
Context Window
The total number of tokens that a model can process (input + output) defines its context window, which influences how much information can be retained in a single request.
API Compatibility
OpenAI’s API has become the de‑facto industry standard; many open‑source tools assume compatibility. Consequently, the author prefers models that support the OpenAI‑compatible API, such as Kimi (Moonshot), DeepSeek, Lingyi, MiniMax, etc.
Rate Limits
Rate limits are often overlooked during development but can cause service outages in production. Proper testing against expected traffic is necessary before launch.
Conclusion
The article equips readers with a step‑by‑step framework for selecting an LLM: start with the most capable model to validate the idea, then transition to a cost‑effective or domestic alternative while weighing price, speed, context length, API compatibility, and rate limits.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
