Artificial Intelligence 16 min read

Research on Domain Large Models by Fudan University Knowledge Factory Lab

This article presents Fudan University's Knowledge Factory Lab research on domain large models, covering background, challenges, data selection, source‑enhanced tagging, capability improvements, self‑correction, collaborative workflows, and retrieval‑augmented generation for practical AI deployment.

DataFunTalk

Jun 15, 2024

Research on Domain Large Models by Fudan University Knowledge Factory Lab

Background: GPT‑4 marks a turning point, showing strong world knowledge but also high inference cost and limited domain applicability.

Challenges: High inference cost, capability gaps in complex decision‑making, and lack of collaboration with existing enterprise workflows hinder practical adoption.

Domain Adaptation: Discusses data quality and proportion for domain LLMs, introduces source‑enhanced tagging (e.g., “wiki”, “news”, “novel”) to improve reliability, and presents a hierarchical corpus classification scheme for better fine‑tuning.

Capability Enhancement: Focuses on improving instruction following, JSON output, and self‑correction via multi‑step answer generation (PAM), as well as command‑generation correction based on error feedback.

Collaborative Work: Proposes a hybrid workflow where traditional models handle most tasks while LLMs are reserved for open‑world reasoning, knowledge‑base verification, and few‑shot learning; also describes knowledge extraction, alignment, and relation extraction pipelines.

Retrieval‑Augmented Generation (RAG): Combines sparse (BM25) and dense (BGE) retrieval, uses source tags to decide strategy, and enforces provenance by decoding hard constraints with special brackets to ensure quoted text matches source.

Conclusion: Summarizes the research directions for deploying domain‑specific large models in practice.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

large language models model fine-tuning AI research domain adaptation Knowledge Graph self-correction retrieval-augmented generation

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.