Parrot: Enhancing Multi-Turn Instruction Following for Large Language Models
This paper introduces Parrot, a system that enhances large language models' (LLMs) multi-turn instruction following capabilities through context-aware preference optimization (CaPO) and synthetic data generation, achieving significant performance improvements with limited training data.
This research presents Parrot, a framework designed to improve large language models' (LLMs) ability to follow multi-turn instructions. The system introduces context-aware preference optimization (CaPO) to address challenges in handling complex dialogue contexts, such as references and omissions. By training a model (Parrot-Ask) to generate human-like multi-turn dialogues and using this data for fine-tuning, Parrot achieves substantial performance gains. The approach combines data synthesis with preference optimization to enhance context understanding, outperforming baseline models like Vicuna and achieving up to 7% absolute improvement with only 40k training samples.
The paper also develops MT-Bench++, an evaluation benchmark for multi-turn instruction following, which includes 8 rounds of dialogue. Experimental results show that Parrot-Chat, the optimized model, surpasses existing open-source models in both MT-Bench and MT-Bench++ evaluations. The CaPO strategy, which generates negative examples simulating context-related errors, further boosts performance by 2.4% when combined with multiple error scenarios.
Key contributions include a new data collection methodology using Parrot-Ask to generate high-quality multi-turn instructions and a systematic evaluation framework. While limited by dataset size and reliance on ChatGPT for data generation, the work advances LLM capabilities in real-world dialogue scenarios.
Kuaishou Tech
Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.