Artificial Intelligence 14 min read

How MaFengWo’s mfw-32B Travel LLM Outperforms DeepSeek‑R1 in Speed and Accuracy

The article details the development, training, and evaluation of MaFengWo's 32‑billion‑parameter travel large language model (mfw‑32B), highlighting its superior itinerary planning, personalized demand capture, budget management, and resource efficiency compared to DeepSeek‑R1, and describing the SFT and reinforcement‑learning stages that enabled these gains.

Mafengwo Technology

Apr 30, 2025

How MaFengWo’s mfw-32B Travel LLM Outperforms DeepSeek‑R1 in Speed and Accuracy

1. Overview

In April 2025, MaFengWo's large‑model R&D team launched its self‑developed travel LLM, mfw‑32B, and deployed it in the MaFengWo app. In travel‑specific scenario testing, mfw‑32B matches DeepSeek‑R1 in itinerary planning quality while surpassing it in response speed, resource usage, hallucination reduction, and knowledge error rate.

2. Application Scenarios

mfw‑32B addresses key travel planning challenges:

Accurate personalized demand capture: Handles multi‑turn long‑context dialogues to extract and summarize user preferences for itinerary generation.

Intelligent time‑window constraints: Leverages massive static travel data and a real‑time knowledge base to infer opening times and optimal visiting periods, converting them into precise time‑window data.

Fine‑grained budget management: Considers transportation, accommodation, meals, tickets, and shopping costs, optimizing plans within the user‑specified total budget.

Optimal itinerary generation: Analyzes the interplay of personalized needs, time windows, and budget to produce conflict‑free, efficient schedules presented as clear daily itineraries or route maps.

3. mfw‑32B Development Process

The development consists of two stages: SFT (Supervised Fine‑Tuning) and Reinforcement Learning.

3.1 SFT Stage

Using MaFengWo's proprietary travel data, a high‑quality fine‑tuning dataset with hundreds of thousands of entries was built. Both LoRA and full‑parameter fine‑tuning were applied; evaluation showed full‑parameter fine‑tuning performed better, resulting in the initial model mfw‑32B‑sft.

3.2 Reinforcement Learning Stage

A custom evaluation framework for travel itinerary planning was created, covering three primary dimensions: knowledge reliability, demand alignment, and execution feasibility, each with multiple sub‑metrics.

Various RL algorithms (PPO, GRPO, DAPO) were tested; GRPO was selected and further refined. Improvements include a specially designed reward model that scores reasoning, content, relevance, and format, a normalization function that emphasizes high‑score distinctions, and an increased clipping upper bound to encourage exploration.

3.3 Training Metrics

Training curves show stable convergence with faster loss reduction after reward redesign. Comparative experiments demonstrate that the optimized model converges more quickly than the baseline.

4. Performance Comparison

On a dedicated travel evaluation dataset, mfw‑32B exceeds DeepSeek‑R1 in token generation speed (80 tokens/s vs 30 tokens/s), first‑token latency (0.21 s vs 1.6 s), and resource consumption (2 cards vs 2 nodes 8 cards), while maintaining comparable planning quality.

Model

Parameters

Deployment Resources

First‑Token Latency

Token Generation Speed

DeepSeek‑R1

671B

2 nodes 8 cards H20

1.6 s

30 tokens/s

mfw‑32B

32B

2 cards H20

0.21 s

80 tokens/s

5. Future Work

Continue optimizing the reward model for the reinforcement‑learning stage and train a dedicated travel‑planning reward model.

Explore adversarial learning with separate question‑generation and answer‑generation models that compete, using the reward model for feedback.

LoRA large language model model evaluation Reinforcement learning AI Optimization travel AI

Written by

Mafengwo Technology

External communication platform of the Mafengwo Technology team, regularly sharing articles on advanced tech practices, tech exchange events, and recruitment.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.