Estimating the Resource and Cost Requirements for Large Language Model Training and Inference
The article analyses the computational resources, hardware costs, and human investment needed to train and serve large language models such as GPT‑3, discusses practical cost calculations, highlights the challenges faced by Chinese AI teams, and argues for sustained, long‑term funding to achieve meaningful breakthroughs.
ChatGPT’s popularity has sparked widespread discussion about large language models (LLMs), prompting Chinese tech giants to assess the resource requirements and the gap with U.S. capabilities, emphasizing the need for realistic expectations.
The author references several background sources on LLMs, including NVIDIA’s solution page and Wikipedia’s overview.
Initially, the author held a belief‑based endorsement of LLMs but doubted their practical business value due to limited short‑term ROI evidence, a sentiment shared by many decision‑makers in domestic tech firms.
After experimenting with ChatGPT, the author’s skepticism lessened, noting that logical coherence, long‑context handling, and fluent generation are reliable indicators of model capability.
Historical advances from word2vec to InstructionGPT and RETRO set the stage for current LLMs; the author estimates serving costs for a GPT‑3‑175B model on a DGX A100 8‑GPU server using FasterTransformer, assuming a batch size of 128, input and output lengths of 200 tokens, resulting in roughly 0.3 ¢ per 1,000 tokens.
Actual input/output length distributions may differ from the assumed 200‑token average.
Batch size 128 optimizes GPU utilization, whereas real‑time interactive scenarios typically use batch size 1, increasing costs.
OpenAI or Microsoft inference services likely do not rely solely on FasterTransformer, so the latency estimate is approximate.
Production pipelines often add upstream/downstream modules (e.g., intervention, caching) that affect cost.
The derived serving cost aligns with publicly reported OpenAI figures; if a business can absorb a few cents per 1,000 tokens, the technology can be scaled across industries.
For training cost estimation, the author uses a token‑based workload model: total tokens (T) multiplied by throughput per GPU (X) and divided by 6 TP (where P is parameter count), referencing NVIDIA’s Megatron‑LM paper for a 175 B‑parameter model trained on 300 B tokens.
Applying this formula, a single epoch of training a 175 B model on 300 B tokens would require about 26 days on 1,024 A100 GPUs with 80 GB memory.
Based on Azure’s pricing, an 8‑GPU A100 server costs roughly $10 k per month (three‑year commitment). To complete a full model training within a month, the hardware cost alone exceeds $1.28 M/month, rising to over $2.56 M/month with a one‑year lease.
Adding a 10 % buffer for unexpected issues and hardware failures is reasonable.
Human resources are also significant: an elite AI systems engineering team plus model experts could cost around $10 M per year.
Combining hardware, overhead, and personnel yields an annual budget of approximately 120 M–240 M RMB; scaling hardware for parallel experiments pushes this to 200 M–500 M RMB.
Over a five‑year horizon, total investment could reach 1 B–2.5 B RMB, comparable to the capital of a high‑potential chip company.
Such sustained, non‑return‑driven funding (e.g., >30 B RMB over five years) is essential, yet many venture capital firms demand near‑term monetization, creating friction for long‑term AI research.
Talent and hardware constraints are acute: while AI systems engineers are relatively abundant, top‑tier model‑building expertise is scarce, and geopolitical tensions limit access to the latest GPUs (e.g., H100 vs. domestically available H800/A800). Mobilizing 2,000–5,000 A100 cards from cloud providers or government inventories would be necessary but requires coordinated industry effort.
The author urges a realistic appraisal of these challenges, sustained investment, and avoidance of hype‑driven, short‑sighted approaches to ensure meaningful progress in LLM research within China.
He also praises the U.S. research ecosystem for its tolerance of trial‑and‑error and long‑term focus, suggesting that similar cultural and institutional support is needed domestically.
Large Model Free Resources Recommendation👇
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.