Artificial Intelligence 7 min read

Insights from the Tencent Advertising Algorithm Competition: Model Framework and Optimization Strategies

The article shares a Tencent competition champion’s practical TensorFlow‑based video ad solution, detailing data handling, model architecture, optimization tricks, multimodal fusion techniques, and experimental observations to help participants improve performance in the 2021 Tencent Advertising Algorithm Contest.

Tencent Advertising Technology

May 28, 2021

Insights from the Tencent Advertising Algorithm Competition: Model Framework and Optimization Strategies

On April 30, 2021, the preliminary round of the 2021 Tencent Advertising Algorithm Competition officially began, co‑hosted by Tencent Advertising, Tencent Cloud AI, Tencent Big Data, Tencent Recruitment, Tencent Universities, and NVIDIA, with AI platform support from TI‑ONE and NVIDIA.

The competition is jointly organized with the ACM Multimedia conference, featuring two video‑advertising tracks that have been selected for the ACM MM Grand Challenge.

Since 2020, an internal track has been opened for Tencent employees, inviting internal technical experts to compete. To help participants, a top performer from the second track, nicknamed “Stone Brother,” shares his solution.

Author Bio: Stone Brother joined Tencent in 2018 after completing his master’s degree and has been working on video understanding ever since, aiming to exchange ideas and techniques related to video understanding through this competition.

Framework Overview: The solution is built on TensorFlow, with a refactored baseline framework for rapid iteration and several key optimizations:

1. Data is read using TFRecord, pre‑serializing video frame features, audio features, and text features (ASR and OCR) into TFRecord files.

2. The title model replaces BERT with TextCNN for faster early‑stage iteration, recommending lightweight models such as TextCNN or Bi‑LSTM before moving to heavier models like BERT.

3. Various perturbation techniques are explored to improve model generalization.

4. Different learning‑rate schedules are investigated.

5. The model architecture follows the nextVlad frame‑aggregation model combined with Se‑Gate multimodal feature fusion and hierarchical multi‑label classification (HMC).

6. Frame and audio features are uniformly divided into N segments (padding to 300 frames, which may be redundant for typical video lengths).

Thanks to TFRecord serialization, the model processes 4,500 videos on a P40 GPU in about 0.7 minutes per epoch, offering a very fast training experience.

Experience Summary: The author iterated the model during weekends and evenings, running over 100 experiments. The listed optimizations proved beneficial and are shared for others to try, though many further improvements remain possible.

Feature Extraction Module:

• Video and audio: any open‑source vectorization model (many action‑recognition papers provide backbones).

• Text features (ASR, OCR): pre‑compute embeddings with BERT or integrate end‑to‑end learning, as text models are lighter than video/audio.

Frame‑Aggregation Module:

• netVLAD or nextVLAD

• Transformer

Multimodal Feature Fusion:

• Se‑Gate

• Various attention mechanisms

Output Layer:

• Multi‑level MLP

• Hierarchical Multi‑label Classification (HMC)

• Mixture‑of‑Experts (MoE)

End‑to‑End Approach: Recommended to try despite limited training samples.

Experimental Tips:

1. Fix random seeds during model iteration for consistent comparisons.

2. Observe a 10‑20% GAP gap between training and validation sets, indicating potential over‑fitting.

3. Multimodal convergence does not always achieve each modality’s individual optimum, suggesting new ideas for fusion strategies.

Finally, participants are encouraged to achieve good results, share ideas, and foster an active discussion environment.

For further details, a live broadcast on May 10 will feature senior algorithm researchers discussing video ad data, label systems, and multimodal video tagging, with PPTs available by replying “直播” to the official account. Registration closes on June 4, 2021.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

TensorFlow Multimodal video understanding competition advertising algorithm

Written by

Tencent Advertising Technology

Official hub of Tencent Advertising Technology, sharing the team's latest cutting-edge achievements and advertising technology applications.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.