Tencent Advertising Multimedia AI Platform: Intelligent Creation, Fine‑grained Understanding, Similar‑Ad Retrieval, and Smart Review
This article presents Tencent's advertising multimedia AI platform, detailing its intelligent video creation engine, fine‑grained ad content understanding, large‑scale similar‑ad retrieval system, and automated ad review pipeline, while also introducing the team and current recruitment opportunities.
Network advertising is a core revenue stream for Tencent, driving the need for advanced multimedia AI technologies across the entire ad delivery chain, which includes modules such as placement, targeting, retrieval, ranking, and playback.
Intelligent Ad Creation – To address the surge in video ad demand, the team built an AI‑powered creation engine that automates video generation, rendering, quality control, and element library management, producing tens of thousands of videos daily and supporting dozens of product formats.
Key capabilities include multimodal video tagging, temporal parsing, cover image generation, CPU/GPU‑based rendering pipelines, template engineering, and a large repository of reusable assets (templates, music, effects, stickers, holiday elements).
Technical highlights cover video size transformation via portrait segmentation, OCR/ASR‑driven subtitle extraction, and 16+ resizing schemes; image‑to‑video conversion using monocular depth estimation; and video‑to‑video generation through temporal parsing and segment‑level tagging, which dramatically accelerates creative production.
Fine‑grained Ad Understanding – The platform builds a hierarchical semantic understanding system that parses ad content at the product, creative, and temporal levels. It constructs a product‑label knowledge graph, leverages large‑scale pre‑training for multimodal embeddings, and extracts fine‑grained tags for items, creative assets, and video sequences.
Techniques include knowledge‑graph construction with expert refinement, massive multimodal pre‑training, multi‑dimensional multimodal analysis (size, duration, narrative, elements), and temporal/elemental labeling of video shots to pinpoint which second and element drives performance.
Similar‑Ad Retrieval – To detect duplicate or near‑duplicate ads at billion‑scale, the system generates multi‑granularity fingerprints (element, creative, ad) using embedding extraction, clustering, and a custom Bitselect hashing method. It supports image‑to‑image, video‑to‑video, and material‑to‑material searches, enabling diverse downstream applications such as recommendation diversification, pre‑flight diagnostics, and automated de‑duplication.
The retrieval pipeline consists of three modules: embedding extraction (text, image, video), multimodal similarity search, and multimodal ranking that fuses results across modalities and applies business‑specific filtering strategies.
Smart Ad Review – The automated review platform combines four capabilities: automatic judgment (multimodal pass/fail models), similarity reuse (leveraging similar‑ad retrieval), negative detection (hundreds of high‑frequency violation types), and a rule engine for industry‑specific policies.
It employs multimodal multi‑label classification, OCR, face detection/recognition (Cosface, Lie‑algebra based models), and a high‑performance face‑chart pipeline that processes large‑scale video streams with parallelized decoding and inference, achieving >98% pass rate and saving over 400 human reviewers.
The team, recognized in international competitions (ICDAR, ACM Multimedia) and with 50+ publications, continues to recruit researchers in computer vision, graphics, and NLP to further advance advertising AI.
Tencent Advertising Technology
Official hub of Tencent Advertising Technology, sharing the team's latest cutting-edge achievements and advertising technology applications.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.