Artificial Intelligence 17 min read

Kuaishou & Renmin AI Institute: Driving Multimodal Large Model Innovation

The article details how Kuaishou’s multimodal AI research, including its K7 trillion‑parameter model and VLUA algorithm, partners with Renmin University’s Gaoling AI Institute to launch a joint lab, produce cutting‑edge papers such as WebBrain and ChatImg, and advance recommendation and search technologies across the short‑video ecosystem.

Kuaishou Tech

Apr 23, 2023

Kuaishou & Renmin AI Institute: Driving Multimodal Large Model Innovation

Background

Large‑scale generative models such as BERT, T5, and the GPT series have driven AI progress. Multimodal models (e.g., GPT‑4, Flamingo, Kosmos‑1, PaLM‑E) combine language, vision, and other modalities, making multimodality a key path toward artificial general intelligence.

Multimodal Large‑Model Landscape

Industry focuses on unified multimodal foundations that can serve diverse tasks, emphasizing practical deployment in real‑world scenarios.

Kuaishou Technical Achievements

Kuaishou has built a 100‑billion‑parameter multimodal model named K7 , which powers recommendation, live‑streaming, e‑commerce and other core services, delivering measurable online gains. Its VLUA algorithm has topped the VCR multimodal benchmark for over six months.

Kuaishou also released a 1.9‑trillion‑parameter ranking model that incorporates long‑term user behavior. The model leverages the PEPNet architecture ( https://arxiv.org/abs/2302.01115) and a two‑stage interest network called TWIN ( https://arxiv.org/abs/2302.02352), enabling fine‑grained interest modeling across millions of historical actions and supporting multi‑task, multi‑scenario learning.

Kuaishou pioneered on‑device intelligent re‑ranking, deploying deep‑learning inference (and limited training) on mobile devices to exploit real‑time user feedback and device‑specific features. This work won the Best Paper award at CIKM 2022 ( https://arxiv.org/abs/2208.09577).

Collaboration with Gaoling AI Institute

On 22 April, Kuaishou and the Gaoling AI Institute of Renmin University established the “China‑Renmin University – Kuaishou Future Media Intelligence Joint Lab”. The lab focuses on multimodal AI models, cross‑modal generation, and intelligent recommendation algorithms, sharing data, compute, and talent.

Key Research Outputs

WebBrain: a retrieval‑augmented generation model that grounds answers on a massive web corpus. Paper: https://openreview.net/pdf?id=eiuj6cNv4iI.

ChatImg: a domestic multimodal generative model capable of understanding images and answering visual queries.

Representative Publications

PEPNet: Parameter and Embedding Personalized Network for Infusing with Personalized Prior Information ( https://arxiv.org/abs/2302.01115).

TWIN: Two‑stage Interest Network for Lifelong User Behavior Modeling ( https://arxiv.org/abs/2302.02352).

Real‑time Short Video Recommendation on Mobile Devices, CIKM 2022 ( https://arxiv.org/abs/2208.09577).

WebBrain: Learning to Generate Factually Correct Articles for Queries by Grounding on Large Web Corpus ( https://openreview.net/pdf?id=eiuj6cNv4iI).

Technical Illustrations

Kuaishou vice president Wang Zhongyuan signing

Trillion‑parameter ranking model architecture

On‑device intelligent short‑video recommendation

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

.ai Large Language Models Recommendation Systems multimodal models Industry collaboration

Written by

Kuaishou Tech

Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.