Artificial Intelligence 17 min read

How the DB3 Team Won the Meta CRAG RAG Challenge: Prompts, Retrieval, and LoRA Fine‑Tuning

This article analyzes the Meta Comprehensive RAG (CRAG) benchmark, detailing its three tasks, evaluation metrics, and the champion DB3 team's end‑to‑end solution that combines data preprocessing, dual‑stage retrieval, prompt engineering, LoRA‑based fine‑tuning, and public data augmentation to achieve top scores across all tasks.

Baobao Algorithm Notes

Oct 16, 2024

How the DB3 Team Won the Meta CRAG RAG Challenge: Prompts, Retrieval, and LoRA Fine‑Tuning

Background

GPT‑4’s factual accuracy on rapidly changing information is often below 35 %. Large language models (LLMs) can hallucinate because of biased training data, limited context understanding, and knowledge‑representation constraints. Reducing hallucinations is essential for trustworthy LLM‑based agents.

CRAG Benchmark Overview

The Meta Comprehensive RAG (CRAG) challenge provides a rigorous evaluation protocol for Retrieval‑Augmented Generation (RAG) systems. It covers five domains, eight question types, and a mix of head, torso, and tail entities to test reasoning and synthesis. Each query has a 30‑second time budget.

Task 1 – Web‑Based Retrieval Summarization

Participants receive five webpages per question and must extract and summarize the relevant information.

Task 2 – Knowledge‑Graph & Web Fusion

A simulated API gives access to a domain‑specific knowledge graph (KG). Participants query the KG and combine the structured results with web data.

Task 3 – End‑to‑End RAG

Each question comes with 50 webpages plus API access, increasing noise and requiring efficient selection of the most useful pieces.

Evaluation Metrics

Perfect : correct answer with no hallucination.

Acceptable : useful answer with minor errors.

Missing : no concrete answer (e.g., “I don’t know”).

Incorrect : wrong or irrelevant answer.

Scoring: Perfect = 1, Acceptable = 0.5, Missing = 0, Incorrect = ‑1. Overall score is a macro‑average weighted by entity popularity (weights undisclosed).

Champion Solution (DB3 Team)

The DB3 team from Peking University achieved first place on all three tasks, with scores of 28.4 %, 42.7 % and 47.8 % respectively.

Task 1 Pipeline

Data preprocessing : Use BeautifulSoup to extract raw text, CharacterTextSplitter (LangChain) to chunk into child chunks (~200 tokens) and parent chunks (~700 tokens), and ParentDocumentRetriever to preserve parent‑child relationships.

Retriever : bge‑base‑en‑v1.5. Retrieve the top‑50 passages. parent_chunk_size determines how many parent chunks are fed to the LLM (e.g., size 2000 → 5 chunks, size 1000 → 10 chunks).

Reranker : bge‑reranker‑v2‑m3.

Public data augmentation : Pre‑process domain‑specific tables into natural‑language statements keyed by entity. Movie domain uses Oscar awards + full MovieLens data; finance domain uses US stock PE, market cap, EPS; music domain uses Grammy awards.

Prompt engineering & SFT : Basic prompt includes token_limit, query_time, and a <doc> token that concatenates public and web retrieval results (truncated to 4000 tokens). Controlled prompts refuse to answer when the question is invalid or the knowledge is absent. SFT labels: “invalid question”, ground‑truth answer, or “I don’t know”. LoRA adapters fine‑tune Llama‑3‑8B‑instruct; multiple adapters enable rapid switching between sub‑tasks. Inference is accelerated with vLLM (noted compatibility issues).

Task 2 & 3 Strategy

API results are prioritized. If the simulated KG API returns a non‑“I don’t know” answer, it is used directly; otherwise the system falls back to the Task 1 web‑retrieval pipeline. For Task 3, a reranker selects the top‑5 of the 50 web pages before applying the same processing.

Knowledge‑Graph Retrieval Module

The module generates normalized API calls from the LLM, parses the responses, and converts them to natural language. The movie schema includes PERSON, MOVIE, CAST, CREW, and OSCAR tables. Instead of full SQL, a lightweight normalized API is used, e.g., cmp(gender,male), sort(condition,sort_key), len operators to support multi‑hop queries.

API Generation Prompt

The prompt supplies Schema_info, API_rules, the query string, and a few hand‑picked in‑context examples. After generating 100 synthetic examples, erroneous ones are added back for robustness. The same approach is applied across all five domains.

Fine‑Tuning the API Generator

Ground‑truth API pairs are first generated by GPT‑4 and manually verified. These high‑quality pairs are used to LoRA‑fine‑tune the LLM, improving API generation under the competition’s time budget.

Insights

The champion’s approach leverages strong database expertise: extensive API redesign, careful prompt crafting to suppress hallucinations, and SFT with higher‑level LLM assistance. The solution is practical rather than flashy, showing that modern LLMs have markedly improved but still require systematic engineering for reliable, production‑grade performance.

Resources

Competition page: https://www.aicrowd.com/challenges/meta-comprehensive-rag-benchmark-kdd-cup-2024

Paper: https://arxiv.org/pdf/2410.00005

Code repository: https://gitlab.aicrowd.com/jiazunchen/kdd2024cup-crag-db3

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

LLM RAG LoRA benchmark knowledge graph

Written by

Baobao Algorithm Notes

Author of the BaiMian large model, offering technology and industry insights.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Background

CRAG Benchmark Overview

Task 1 – Web‑Based Retrieval Summarization

Task 2 – Knowledge‑Graph & Web Fusion

Task 3 – End‑to‑End RAG

Evaluation Metrics

Champion Solution (DB3 Team)

Task 1 Pipeline

Task 2 & 3 Strategy

Knowledge‑Graph Retrieval Module

API Generation Prompt

Fine‑Tuning the API Generator

Insights

Resources

Baobao Algorithm Notes

How this landed with the community

Was this worth your time?

0 Comments

Task 1 Pipeline

Task 2 & 3 Strategy