How Vicuna-13B Achieves ChatGPT‑Level Performance with Low‑Cost Open‑Source Training
The Vicuna-13B open‑source chatbot, fine‑tuned from LLaMA on 70k ShareGPT conversations, matches over 90% of ChatGPT and Google Bard quality while costing only about $300 to train, thanks to memory optimizations, multi‑turn dialogue handling, and cheap spot‑instance training.
Vicuna-13B Open‑Source Chatbot Overview
Large language models (LLM) have transformed chatbot systems, exemplified by OpenAI's ChatGPT, but their training and architecture details remain opaque, hindering research and open‑source innovation. Inspired by Meta LLaMA and Stanford Alpaca, researchers from UC Berkeley, CMU, Stanford, and UC San Diego released Vicuna‑13B, an open‑source chatbot built on an enhanced dataset and scalable infrastructure.
Vicuna‑13B is fine‑tuned from the LLaMA base model using approximately 70,000 user‑shared conversations collected from ShareGPT.com. Compared with Stanford Alpaca and other open models, it demonstrates competitive performance.
Preliminary evaluation using GPT‑4 as a judge shows Vicuna‑13B reaches over 90% of the quality of OpenAI ChatGPT and Google Bard, and surpasses LLaMA and Alpaca on more than 90% of questions. Training cost was about $300, and the training and service code, along with an online demo, are publicly released for non‑commercial use.
To ensure data quality, the Vicuna team converted HTML back to markdown, filtered low‑quality samples, and split long dialogues to fit the model's maximum context length. Improvements over Alpaca include:
Memory optimization: context length increased from 512 to 2048, using gradient checkpointing and flash attention to alleviate GPU memory pressure.
Multi‑turn dialogue handling: loss adjusted to consider multi‑turn conversations, with fine‑tuning loss computed only on chatbot outputs.
Spot instance cost reduction: leveraging SkyPilot managed spot instances lowered the 13B model's training cost from roughly $1,000 to $300.
The team also built a service system that uses distributed workers to serve multiple models, supporting GPU workers from both local clusters and cloud providers. SkyPilot’s fault‑tolerant controller and managed spot features enable cheap spot instances across clouds, reducing serving costs.
Training details: about 70k ShareGPT dialogues were used, the Alpaca training script was enhanced for longer sequences, and training completed in one day on eight A100 GPUs with PyTorch FSDP. For evaluation, 80 curated questions were created and judged by GPT‑4, comparing outputs from LLaMA, Alpaca, ChatGPT, Bard, and Vicuna.
Findings show GPT‑4 rates Vicuna’s answers higher than LLaMA/Alpaca on over 90% of questions and comparable to proprietary models; in 45% of cases Vicuna matches or exceeds ChatGPT. However, Vicuna still struggles with reasoning, math, and coding tasks, and safety measures rely on OpenAI’s moderation API to filter inappropriate inputs.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Programmer DD
A tinkering programmer and author of "Spring Cloud Microservices in Action"
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
