Alibaba Cloud PAI’s Breakthroughs in Chinese Diffusion, Prompting, and LLM Knowledge Editing
Recent ACL 2024 papers from Alibaba Cloud’s PAI platform showcase open‑source Chinese diffusion models, an interactive multi‑turn prompt generator, a long‑tail knowledge‑aware retrieval‑augmented LLM approach, and a dynamic fusion network for sequential model editing, all integrated into cloud services.
Recently, several papers from Alibaba Cloud’s Artificial Intelligence platform PAI were accepted at ACL 2024, the premier conference on natural language processing. The work, conducted jointly with Alibaba Security, Prof. Jin Lian‑wen’s team at South China University of Technology, and Prof. He Xiaofeng’s team at East China Normal University, demonstrates that PAI’s research in natural language processing and multimodal algorithms has earned academic recognition.
PAI‑Diffusion: Open Chinese Text‑to‑Image Diffusion Models and Cloud Inference Service
Inspired by Stable Diffusion, the PAI team adapted the architecture for Chinese language characteristics, curated and filtered Chinese pre‑training data, and optimized training to produce a family of 12 open‑source Chinese diffusion models (including base, LoRA, and ControlNet variants). The models deliver higher image quality and diverse styles. Two inference tools are provided: Chinese SD WebUI, a zero‑code plugin for Stable Diffusion WebUI, and Diffusers‑API, an API for online deployment of Chinese models. Detailed descriptions appear in the paper and technical blog, and the work will be presented at ACL 2024.
DiffChat: Interactive Multi‑Turn Prompt Generation for Stable Diffusion
Generating high‑quality images with diffusion models often requires repeated manual prompt editing, which is time‑consuming and unpredictable. DiffChat is a text‑to‑text multi‑turn generation model that rewrites original prompts based on user instructions, producing refined prompts that guide the image generator toward desired results. The method builds a task‑specific dataset via prompt beautification and engineering, applies supervised fine‑tuning, and further improves performance with a reinforcement‑learning technique that incorporates aesthetic, human‑preference, and content‑completeness feedback, as well as dynamic action‑space correction and state‑value estimation.
Long‑Tail Knowledge in Retrieval‑Augmented Large Language Models
Retrieval‑augmented generation (RAG) improves large language models (LLMs) by injecting retrieved documents, yet most methods ignore which knowledge types are truly needed. This work argues that long‑tail knowledge is crucial because LLMs already memorize high‑frequency world facts during pre‑training. A new metric, Generative Expected Calibration Error (GECE), combines statistical and semantic cues to measure knowledge “long‑tailedness.” Retrieval is triggered only for queries involving long‑tail knowledge, yielding over 4× faster inference and consistent downstream performance gains.
Experiments also analyze token‑level probabilities and gradient magnitudes, showing that long‑tail instances have lower average word frequency (α) and smaller gradients, which the GECE metric captures.
DAFNet: Dynamic Auxiliary Fusion for Sequential Model Editing in Large Language Models
Large language models still suffer from hallucinations and factual errors. Existing model‑editing approaches treat corrections as one‑off edits, ignoring the need for continual updates. DAFNet introduces a dynamic auxiliary fusion network that enhances semantic interaction among factual knowledge across the entire sequence, preventing catastrophic forgetting during multi‑triple edits.
The method aggregates attention flows at token level and updates sequence‑level representations via multi‑layer diagonal cross‑editing attention. A new dataset, DAFSet, provides recent, popular, long‑tail, and robust editing cases, improving generality. Experiments show DAFNet outperforms strong baselines in both single‑round and sequential editing, and DAFSet boosts other auxiliary‑network methods.
These research outcomes have been deeply integrated into various PAI modules, providing AI model training and inference services. Chinese SD WebUI integrates seamlessly with PAI‑EAS for one‑click deployment, Diffusers‑API enables cloud deployment of large text‑to‑image models, and PAI‑QuickStart offers over 50 popular LLMs with diverse training and inference options.
Paper Summary
PAI‑Diffusion: Constructing and Serving a Family of Open Chinese Diffusion Models for Text‑to‑image Synthesis on the Cloud – Authors: Wang Chengyu, Duan Zhongjie, Liu Bingyan, Zou Xinyi, Chen Cen, Jia Kui, Huang Jun – PDF
DiffChat: Learning to Chat with Text‑to‑Image Synthesis Models for Interactive Image Creation – Authors: Wang Jiapeng, Wang Chengyu, Cao Tingfeng, Huang Jun, Jin Lian‑wen – PDF
On the Role of Long‑tail Knowledge in Retrieval Augmented Large Language Models – Authors: Li Dongyang, Yan Junbing, Zhang Taolin, Wang Chengyu, He Xiaofeng, Huang Longtao, Xue Hui, Huang Jun – PDF
DAFNet: Dynamic Auxiliary Fusion for Sequential Model Editing in Large Language Models – Authors: Zhang Taolin, Chen Qizhou, Li Dongyang, Wang Chengyu, He Xiaofeng, Huang Longtao, Xue Hui, Huang Jun – PDF
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Big Data AI Platform
The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
