Artificial Intelligence 12 min read

Kuaipedia: Building a Short‑Video Encyclopedia with Multimodal Knowledge Extraction

This article introduces Kuaipedia, Kuaishou's multimodal short‑video encyclopedia, detailing its background, system architecture, knowledge‑video recognition pipeline, multimodal entity linking techniques, and downstream applications, while also providing implementation insights and a brief Q&A.

DataFunSummit

Aug 16, 2023

Kuaipedia: Building a Short‑Video Encyclopedia with Multimodal Knowledge Extraction

Background – Kuaishou launched Kuaipedia (Kuaipedia) to create a massive multimodal short‑video encyclopedia, motivated by rising audience interest in factual and educational content, government initiatives for science popularization, and the longer lifecycle of knowledge‑oriented videos compared to entertainment clips.

System Overview – Kuaipedia consists of three core elements: Item (a unified entry for a concept or entity), Aspect (specific knowledge points related to an item), and Video (short videos that convey what, why, or how‑to information). These components are visualized in a series of diagrams illustrating the hierarchical relationship between items, aspects, and associated videos.

Construction Pipeline – The building process involves several stages:

Knowledge video identification using a multimodal Fusion model that combines textual features (category, title, description), visual cues (cover image), OCR, and ASR outputs.

Query intent detection by analyzing user search behavior; queries that lead to clicks on knowledge videos are treated as knowledge‑intent queries.

Mention extraction with a BERT+CRF sequence labeling model to locate entity mentions (e.g., “柴犬” and “纠正误食”).

Entity disambiguation linking mentions to existing encyclopedia entries, followed by embedding‑based deduplication using cosine similarity.

Aspect filtering via a discriminative model that retains high‑quality, relevant knowledge points.

Multimodal Knowledge Linking – After constructing the knowledge graph, a multimodal entity‑linking module aligns video‑extracted entities with the encyclopedia. It employs multi‑modal embeddings, OCR/ASR text, and a Prompt‑augmented pre‑trained language model, enhanced with Graph Neural Network (GNN) encodings, to decide whether a video should be attached to a specific item and aspect.

Applications and Deployment – The enriched knowledge base benefits downstream tasks such as Entity Typing and Entity Linking, improving performance metrics. In production, Kuaipedia is integrated into Kuaishou’s playback and search interfaces, offering users knowledge‑rich video results.

Q&A Highlights – The system currently hosts tens of millions of items and aspects, with internal access already available. Public release will be gradual, with demo cases and potential author certification to ensure video authority and accuracy.

For more details, see the GitHub repository https://github.com/Kuaipedia/Kuaipedia and the technical report https://arxiv.org/abs/2211.00732 .

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Multimodal AI short video Knowledge Extraction entity linking Kuaipedia

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.