Lookalike Audience Extension Algorithms in iQIYI Advertising: Tag‑Based and Machine‑Learning Approaches
iQIYI uses two Lookalike audience extension methods—tag‑based using weighted tag scoring and supervised machine‑learning using logistic regression with engineered DMP and ad behavior features—both improving ad performance, e.g., 20% higher Trueview completion and up to 60% lower conversion cost.
In the advertising industry, Lookalike (audience extension) refers to using algorithms to find users similar to a set of high‑potential seed users provided by advertisers, thereby enabling precise targeting and reducing conversion costs.
The typical workflow involves advertisers supplying seed user IDs (often past purchasers), the platform leveraging DMP data and Lookalike algorithms to discover similar users, and finally the advertiser delivering ads to this expanded audience.
Various companies implement Lookalike using techniques such as tag selection, machine learning, collaborative filtering, neural networks, and social graph analysis.
At iQIYI, two methods were explored: a tag‑based Lookalike algorithm and a supervised machine‑learning Lookalike algorithm.
Tag‑Based Lookalike Algorithm
iQIYI constructs rich user profiles containing tens of thousands of tags (demographics, interests, etc.). By analyzing seed users, the most distinctive tags (e.g., gender , age <18‑25>, shopping interest, fashion‑program preference) are identified and used to retrieve additional users sharing those tags.
The implementation follows a Yahoo paper that scores tags on three dimensions: similarity (overlap between tag‑covered users and seed users), novelty (proportion of new users covered by the tag), and tag quality (historical ad performance metrics such as CTR, CVR, ROI). Scores are smoothed with logarithmic functions and combined via weighted averaging to rank tags, and the top‑N tags’ covered users become the extended audience.
Advantages: simple, easy to implement, interpretable. Disadvantages: complex parameter tuning, coarse granularity, and limited ability to jointly compare multiple tag dimensions.
Machine‑Learning Lookalike Algorithm
To address the shortcomings of the tag‑based method, a supervised learning approach was adopted. Seed users serve as positive samples; negative samples are generated using two strategies: (1) historical negative feedback (ad skips, non‑click views) and (2) the Spy method to automatically create reliable negatives.
Model selection favored logistic regression (LR) for its interpretability, though models such as GBDT and FM were also considered. The final ranking combines predicted probability, expected order volume, and historical user visit frequency to mitigate user repetition issues.
Feature engineering leveraged iQIYI’s DMP data (demographics, viewing/search preferences, commercial interests) and additional ad‑related behavior features (e.g., feedback on different industry ads, freshness of ads to users).
Application Results in iQIYI Advertising
Lookalike has been applied to Trueview video ads and first‑party advertiser precise targeting, delivering notable performance gains.
Trueview ads allow users to skip after 5 seconds; advertisers are charged only for viewers who watch beyond a threshold (≈30 seconds). Using Lookalike increased the Trueview completion rate by more than 20% compared to standard targeting.
For first‑party targeting, an A/B test showed that a maternity brand’s conversion cost dropped by 28.2% and a dating platform’s conversion cost fell by 60% when using Lookalike‑generated audiences versus manually selected tags.
Conclusion
iQIYI has successfully deployed both tag‑based and machine‑learning Lookalike algorithms, leveraging rich user profiles and advertising features to improve campaign effectiveness. Ongoing work will focus on further algorithmic optimization and user‑experience enhancement.
References
[1] Effective Audience Extension in Online Advertising (https://dl.acm.org/citation.cfm?id=2788603)
[2] Building Text Classifiers Using Positive and Unlabeled Examples (https://www.computer.org/csdl/proceedings/icdm/2003/1978/00/19780179-abs.html)
[3] Partially supervised classification of text documents (https://dl.acm.org/citation.cfm?id=656022)
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
