How Vicuna-13B Achieves ChatGPT‑Level Performance with Low‑Cost Open‑Source Training

The Vicuna-13B open‑source chatbot, fine‑tuned from LLaMA on 70k ShareGPT conversations, matches over 90% of ChatGPT and Google Bard quality while costing only about $300 to train, thanks to memory optimizations, multi‑turn dialogue handling, and cheap spot‑instance training.

Programmer DD
Programmer DD
Programmer DD
How Vicuna-13B Achieves ChatGPT‑Level Performance with Low‑Cost Open‑Source Training

Vicuna-13B Open‑Source Chatbot Overview

Large language models (LLM) have transformed chatbot systems, exemplified by OpenAI's ChatGPT, but their training and architecture details remain opaque, hindering research and open‑source innovation. Inspired by Meta LLaMA and Stanford Alpaca, researchers from UC Berkeley, CMU, Stanford, and UC San Diego released Vicuna‑13B, an open‑source chatbot built on an enhanced dataset and scalable infrastructure.

Vicuna‑13B is fine‑tuned from the LLaMA base model using approximately 70,000 user‑shared conversations collected from ShareGPT.com. Compared with Stanford Alpaca and other open models, it demonstrates competitive performance.

Preliminary evaluation using GPT‑4 as a judge shows Vicuna‑13B reaches over 90% of the quality of OpenAI ChatGPT and Google Bard, and surpasses LLaMA and Alpaca on more than 90% of questions. Training cost was about $300, and the training and service code, along with an online demo, are publicly released for non‑commercial use.

To ensure data quality, the Vicuna team converted HTML back to markdown, filtered low‑quality samples, and split long dialogues to fit the model's maximum context length. Improvements over Alpaca include:

Memory optimization: context length increased from 512 to 2048, using gradient checkpointing and flash attention to alleviate GPU memory pressure.

Multi‑turn dialogue handling: loss adjusted to consider multi‑turn conversations, with fine‑tuning loss computed only on chatbot outputs.

Spot instance cost reduction: leveraging SkyPilot managed spot instances lowered the 13B model's training cost from roughly $1,000 to $300.

The team also built a service system that uses distributed workers to serve multiple models, supporting GPU workers from both local clusters and cloud providers. SkyPilot’s fault‑tolerant controller and managed spot features enable cheap spot instances across clouds, reducing serving costs.

Training details: about 70k ShareGPT dialogues were used, the Alpaca training script was enhanced for longer sequences, and training completed in one day on eight A100 GPUs with PyTorch FSDP. For evaluation, 80 curated questions were created and judged by GPT‑4, comparing outputs from LLaMA, Alpaca, ChatGPT, Bard, and Vicuna.

Findings show GPT‑4 rates Vicuna’s answers higher than LLaMA/Alpaca on over 90% of questions and comparable to proprietary models; in 45% of cases Vicuna matches or exceeds ChatGPT. However, Vicuna still struggles with reasoning, math, and coding tasks, and safety measures rely on OpenAI’s moderation API to filter inappropriate inputs.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AILLMevaluationChatbotVicunaOpen-source
Programmer DD
Written by

Programmer DD

A tinkering programmer and author of "Spring Cloud Microservices in Action"

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.