Kuaishou & Renmin AI Institute: Driving Multimodal Large Model Innovation
The article details how Kuaishou’s multimodal AI research, including its K7 trillion‑parameter model and VLUA algorithm, partners with Renmin University’s Gaoling AI Institute to launch a joint lab, produce cutting‑edge papers such as WebBrain and ChatImg, and advance recommendation and search technologies across the short‑video ecosystem.
Background
Large‑scale generative models such as BERT, T5, and the GPT series have driven AI progress. Multimodal models (e.g., GPT‑4, Flamingo, Kosmos‑1, PaLM‑E) combine language, vision, and other modalities, making multimodality a key path toward artificial general intelligence.
Multimodal Large‑Model Landscape
Industry focuses on unified multimodal foundations that can serve diverse tasks, emphasizing practical deployment in real‑world scenarios.
Kuaishou Technical Achievements
Kuaishou has built a 100‑billion‑parameter multimodal model named K7 , which powers recommendation, live‑streaming, e‑commerce and other core services, delivering measurable online gains. Its VLUA algorithm has topped the VCR multimodal benchmark for over six months.
Kuaishou also released a 1.9‑trillion‑parameter ranking model that incorporates long‑term user behavior. The model leverages the PEPNet architecture ( https://arxiv.org/abs/2302.01115) and a two‑stage interest network called TWIN ( https://arxiv.org/abs/2302.02352), enabling fine‑grained interest modeling across millions of historical actions and supporting multi‑task, multi‑scenario learning.
Kuaishou pioneered on‑device intelligent re‑ranking, deploying deep‑learning inference (and limited training) on mobile devices to exploit real‑time user feedback and device‑specific features. This work won the Best Paper award at CIKM 2022 ( https://arxiv.org/abs/2208.09577).
Collaboration with Gaoling AI Institute
On 22 April, Kuaishou and the Gaoling AI Institute of Renmin University established the “China‑Renmin University – Kuaishou Future Media Intelligence Joint Lab”. The lab focuses on multimodal AI models, cross‑modal generation, and intelligent recommendation algorithms, sharing data, compute, and talent.
Key Research Outputs
WebBrain: a retrieval‑augmented generation model that grounds answers on a massive web corpus. Paper: https://openreview.net/pdf?id=eiuj6cNv4iI.
ChatImg: a domestic multimodal generative model capable of understanding images and answering visual queries.
Representative Publications
PEPNet: Parameter and Embedding Personalized Network for Infusing with Personalized Prior Information ( https://arxiv.org/abs/2302.01115).
TWIN: Two‑stage Interest Network for Lifelong User Behavior Modeling ( https://arxiv.org/abs/2302.02352).
Real‑time Short Video Recommendation on Mobile Devices, CIKM 2022 ( https://arxiv.org/abs/2208.09577).
WebBrain: Learning to Generate Factually Correct Articles for Queries by Grounding on Large Web Corpus ( https://openreview.net/pdf?id=eiuj6cNv4iI).
Technical Illustrations
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Kuaishou Tech
Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
