Tagged articles
1 articles
Page 1 of 1
Baobao Algorithm Notes
Baobao Algorithm Notes
Nov 13, 2024 · Artificial Intelligence

Why Cleaning SFT Data Is a Nightmare: Hidden JSON Formatting Pitfalls

Cleaning SFT data for LLMs is surprisingly complex, as subtle JSON formatting variations, inconsistent markdown wrappers, intent settings, and unit handling can cause model inconsistencies, requiring unified standards, careful prompt design, and extensive manual review to ensure reliable training outputs.

JSON formattingLLM data cleaningModel Training
0 likes · 8 min read
Why Cleaning SFT Data Is a Nightmare: Hidden JSON Formatting Pitfalls