Artificial Intelligence 10 min read

How DAFNet Enables Efficient Sequential Editing of Large Language Models

This article introduces DAFNet, a dynamic auxiliary fusion framework that enables efficient sequential editing of large language models by injecting knowledge with reduced resource costs while preserving model reliability, generalization, and mitigating hallucination, and details its dataset, architecture, and evaluation results.

Alibaba Cloud Big Data AI Platform

Aug 20, 2024

How DAFNet Enables Efficient Sequential Editing of Large Language Models

Recently, Alibaba Cloud AI Platform PAI, together with the security team and Prof. He Xiaofeng's group from East China Normal University, presented the paper "DAFNet: Dynamic Auxiliary Fusion for Sequential Model Editing in Large Language Models" at ACL2024. The work addresses the high resource cost of updating large language models by sequentially injecting knowledge through model editing, reducing update expenses and mitigating hallucination.

Background

Large language models possess strong knowledge and reasoning abilities but require massive computation for pre‑training. Updating them with new knowledge is costly, making efficient model editing essential. Existing methods focus on single‑step edits and struggle with forgetting when editing multiple facts sequentially.

Contributions

The paper makes two main contributions:

Construction of a dedicated dataset for training the auxiliary network, designed according to evaluation metrics to endow the network with basic editing capabilities.

Proposal of a dynamic interactive auxiliary network for sequential model editing, which employs internal and external editing attention mechanisms to capture semantic relationships among input sequences.

Dataset Overview

Data Collection

The dataset is built from Wikidata triples (e_h, r, e_t) and includes four properties:

Recency : recent triples collected over the past 7 days for 48 common relations.

Popularity : triples involving high‑traffic Wikipedia entities, with multi‑hop tail selection.

Long‑tailness : low‑frequency entities selected by frequency, KG degree, and likelihood based on model output probabilities.

Robustness : variations in text length, context, and sentiment, with opposite attributes generated for each sample.

Data Analysis

The distribution analysis shows that the dataset exhibits long‑tail characteristics across entity frequency, KG degree, and semantic likelihood, covering multiple domains to improve generalization during editing.

Algorithm Overview

The proposed DAFNet consists of four stages: sequential edit signal acquisition, dynamic auxiliary fusion learning, intra‑editing and inter‑editing attention flows, and edit training with a loss that balances reliability, generalization, and locality.

Intra‑editing attention models token interactions within each input sequence, while inter‑editing attention captures relationships across sequences, iteratively refining representations before gradient fusion.

Edit training employs a loss that integrates reliability, generalization, and locality to guide the model.

Evaluation

DAFNet was evaluated on three public model‑editing benchmarks, measuring reliability, generalization, and locality across four edit frequencies. Results show superior performance over baselines, with notable improvements on the DAFSet dataset.

References

Zeyu Huang et al., "Transformer‑patcher: One mistake worth one neuron," ICLR 2023.

Nicola De Cao et al., "Editing factual knowledge in language models," EMNLP 2021.

Qingxiu Dong et al., "Calibrating factual knowledge in pretrained language models," EMNLP 2022.

Kevin Meng et al., "Mass editing memory in a transformer," ICLR 2023.

Eric Mitchell et al., "Fast model editing at scale," ICLR 2022.

Eric Mitchell et al., "Memory based model editing at scale," ICML 2022.

Derek Tam et al., "Evaluating the factual consistency of large language models through news summarization," ACL 2023.

AI research model editing dynamic auxiliary fusion sequential editing

Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.