Artificial Intelligence 18 min read

How Domain Large Models Are Shaping the Future of AI: Challenges and Solutions

This article reviews Fudan University's Knowledge Factory Lab research on domain large models, covering background, three major deployment challenges, data‑selection strategies, ability‑enhancement techniques, collaborative workflows, and retrieval‑augmented generation methods that aim to make large models practical for real‑world tasks.

NewBeeNLP

Jun 24, 2024

How Domain Large Models Are Shaping the Future of AI: Challenges and Solutions

Background

The technical report from GPT‑4 notes that the model is still in the early stage of artificial general intelligence (AGI), while emerging versions such as GPT‑4.5 or GPT‑5 already exhibit some AGI traits. GPT‑4’s strong world‑knowledge and common‑sense capabilities have sparked the question of whether large models will render traditional knowledge engineering obsolete.

Three Core Challenges

Challenge 1 – Inference Cost

Running and serving large models demands massive computational resources; even optimized versions take several seconds to process a single text analysis. When context length reaches hundreds of thousands of tokens, the required compute skyrockets, making repeated large‑scale use economically impractical despite corporate willingness to invest.

Challenge 2 – Decision‑Making Limitations

While large models excel at open‑ended chat, they struggle in rigorous industrial or commercial scenarios such as reliable code generation. Current models (e.g., GPT‑3, GPT‑4) still require extensive prompt engineering and multiple interactions, and it remains uncertain whether upcoming versions will meet strict enterprise requirements.

Challenge 3 – Collaboration and Controllability

Integrating a powerful model into existing business processes demands alignment with workflows, risk management, and human‑in‑the‑loop control. Rather than replacing legacy systems, the realistic goal is to let large models assist specific pipeline stages where their open‑world reasoning adds unique value.

Domain Adaptation

Domain‑specific large models (e.g., medical, finance) are built via continual pre‑training, but data quality and proportion are critical. The authors categorize training data into three layers: unnecessary base data, overly fine‑grained data (e.g., real‑time stock prices), and high‑value industry data, the latter being scarce and costly.

To address data‑source bias, they prepend a special token indicating provenance (e.g., [WIKI], [NEWS], [NOVEL]) to each corpus segment. Experiments show that this simple “source‑enhancement” improves performance on several downstream tasks, sometimes surpassing larger, unenhanced models.

Further findings reveal that providing task‑relevant context (e.g., a research paper for analysis or a sci‑fi prompt for generation) yields additional gains, and that the exact textual form of the source tag is less important than the fact that the model can distinguish provenance.

Ability Enhancement

The team emphasizes improving complex‑instruction compliance rather than solely boosting benchmark scores. They define eight instruction dimensions (format constraints, content constraints, etc.) and automatically generate combinatorial test cases. Evaluation focuses on binary correctness, enabling programmatic assessment.

When the number of simultaneous constraints exceeds three, most models begin to miss or partially fulfill requirements, highlighting a gap in multi‑constraint reasoning.

In industrial settings, the lack of unit awareness leads to catastrophic errors; to mitigate this, the authors construct a corpus of dimensional‑aware texts and pre‑train a model that outperforms GPT‑4 on unit‑sensitive reasoning tasks.

They also introduce a self‑correction mechanism: training data are expanded from [Q/A] to [Q/A1, A2, A3] where each subsequent answer improves upon the previous. Fine‑tuning with this “Partial Answer Masking” (PAM) enables the model to iteratively refine its responses, a capability later applied to command‑generation scenarios where error feedback guides correction.

Collaborative Workflows

Rather than pursuing an end‑to‑end answer from raw documents, the authors advocate a hybrid pipeline: traditional BERT‑style models handle the bulk of extraction and validation tasks (80‑90% accuracy), while large models are reserved for knowledge‑base correction, commonsense verification, and few‑shot learning where open‑world reasoning is indispensable.

Three concrete projects illustrate this synergy:

Fine‑tuned extraction models achieve 92% accuracy on a news‑entity task with only 300 labeled examples, far surpassing ChatGPT’s 60% due to better prompt alignment.

A composite knowledge‑extraction system combines entity extraction, alignment, and relation extraction, allocating large or small models per sub‑task to exceed both single‑model SOTA results.

Domain‑specific commonsense verification uses specially crafted prompts to let the model detect factual errors, outperforming rule‑based reasoning.

For NL‑to‑SQL and similar tasks, the authors stress the necessity of domain‑specific fine‑tuning so that the model understands business‑specific terminology (e.g., “best‑performing fund”).

Finally, the paper discusses Retrieval‑Augmented Generation (RAG). Sparse retrieval (BM25) excels on precise queries, while dense retrieval (BGE) offers semantic coverage but can retrieve irrelevant content. The proposed solution dynamically blends both methods based on the presence of domain‑specific terms, ensuring that generated answers can be traced back to source documents for reliability.

The authors conclude with a decoding‑hard‑constraint technique: during fine‑tuning, special brackets mark sections that must be copied verbatim from retrieved text, guaranteeing factual fidelity while still allowing the model to reason around those anchors.

Overall, the research presents a comprehensive roadmap for making domain large models practically useful through data‑aware training, instruction compliance, self‑correction, and intelligent collaboration with smaller models.

large language models Domain Adaptation Knowledge Extraction Model Alignment retrieval-augmented generation

Written by

NewBeeNLP

Always insightful, always fun

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.