Why Pretraining Boosts New Engineers More Than SFT: A Practical Guide
The answer argues that fresh graduates should join pre‑training teams because the required engineering tasks—large‑scale data crawling, Hadoop/Spark pipelines, torch and CUDA setup, megatron code debugging, and scaling‑law experiments—rapidly sharpen coding skills, while SFT work focuses mainly on data labeling and offers slower technical growth.
Recommendation: Choose Pretrain
For a newly hired graduate, the top priority is to improve engineering and coding ability. The author recommends joining a pre‑training team because its core tasks provide faster and deeper technical growth.
Core Skills Required in Each Team
Pretrain team: mandatory work includes crawling internet data, large‑scale data cleaning and deduplication (using Hadoop, Spark, etc.), configuring various Torch and CUDA environments, understanding, modifying, and optimizing Megatron code, debugging multi‑machine communication errors, mastering "alchemy" techniques such as data ratio, learning‑rate schedules, optimizer choices, curriculum learning, loss‑curve analysis, and scaling‑law experiments, and running benchmarks to validate model capability.
SFT team: training frameworks and alchemy tricks are optional; the required work mainly consists of manually labeling data, using GPT‑4 to label data, teaching annotators how to label, leveraging user feedback for data improvement, and synthesizing data based on papers.
Transition Difficulty Between Teams
Because the mandatory skills differ, a pretrain engineer can quickly pick up SFT tasks after a day of reviewing training data, whereas an SFT engineer may need two weeks to learn Megatron before contributing to pretrain work.
Pretrain core technology: training code.
SFT core technology: training data.
Miscellaneous Work ("Dirty Work") Comparison
Pretrain: cleaning massive, noisy internet data into clean training data, which involves building small models to score data, writing extensive rules to filter garbage text, researching domain‑specific data characteristics, and filtering domain data.
SFT: manual data labeling and iteratively finding the best GPT‑4 prompts for labeling.
Advice for Newcomers
The biggest pain point of pretrain work is the lack of short‑term profit and the long‑term difficulty of training large models like LLaMA or Qwen; however, as a fresh graduate, you are protected by the "new graduate" status, so you can take on the hardest tasks, grow quickly, and avoid risky commands like rm -rf *. After a few years, you may be forced to choose revenue‑driven work instead of technically challenging work.
The author acknowledges that SFT also has technical depth but believes pretrain offers a faster path to solid engineering fundamentals, which is especially valuable for newcomers.
Regardless of the team you join, the key is to seek mentorship, ask many questions, and proactively learn pretrain‑related knowledge even if your official duties focus on SFT.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Baobao Algorithm Notes
Author of the BaiMian large model, offering technology and industry insights.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
