Design and Evolution of Incremental Indexing for Advertising Retrieval Systems

The article describes how an advertising retrieval system evolved from serial to parallel full builds and finally to a hybrid incremental indexing approach that records direct entity relationships during assembly, enabling fast reverse‑lookup of changed units via inverted indexes, reducing database load, latency, and rebuild overhead.

Bilibili Tech
Bilibili Tech
Bilibili Tech
Design and Evolution of Incremental Indexing for Advertising Retrieval Systems

This article discusses the design and evolution of incremental (real‑time) indexing in an advertising retrieval system, focusing on how to construct a complete, reliable, and maintainable ad update data flow.

Business background : Online advertising relies on an ad retrieval system where an "index building service" creates material indexes for units (ads) and a search engine loads them for real‑time queries. The data model is complex, involving over 100 relational tables (unit, account, plan, creative, video, up, etc.), leading to high database load and latency as the system scales.

Solution evolution : Three stages are described: 1) Full‑serial: all data queried and built serially. 2) Batch‑parallel: unit IDs are split into batches and processed in parallel respecting data dependencies. 3) Incremental + full: periodic full builds plus incremental builds for changed units.

Incremental index : Two types of ad material indexes are defined: - Full ad index (periodic full build). - Incremental ad index (build only for units with data changes). The article shows the workflow diagrams for both.

Challenge – reverse lookup of changed unit IDs : Determining which unit IDs changed when a downstream entity (e.g., up, video) is updated is difficult due to missing indexes, complex field types, and multi‑hop relationships.

Proposed solution – record direct relationships : By decorating data‑access services during the build process, the system automatically records the association between a unit ID and every other entity ID it accesses (account_id, plan_id, video_id, up_mid). Example interface definitions:

interface UnitBaseService {
  // Get basic unit info (needs further enrichment)
  UnitBase getUnitById(long unitId);
}

interface CreativeService {
  List<Creative> getCreativeByUnitId(int unitId);
}

interface VideoService {
  Map<Long, Video> getVideoByVideoIds(Iterable<Long> videoIds);
}

interface UpService {
  Map<Long, Up> getUpByUpMids(Iterable<Long> upMids);
}

The assembly process uses these services:

class Assembler {
  UnitBaseService unitBaseService;
  CreativeService creativeService;
  VideoService videoService;
  UpService upService;

  Unit assembleUnit(long unitId) {
    UnitBase unitBase = unitBaseService.getUnitById(unitId);
    List<Creative> creativeList = creativeService.getCreativeByUnitId(unitId);
    // extract videoIds from creativeList
    Map<Long, Video> videoMap = videoService.getVideoByVideoIds(videoIds);
    // extract upMids from videoMap
    Map<Long, Up> upMap = upService.getUpByUpMids(upMids);
    // assemble all data into a usable ad unit
    Unit unit = doAssemble(unitBase, creativeList, videoMap, upMap);
    return unit;
  }
}

After building, the recorded direct relations are stored as an inverted index (e.g., up_mid → [unit1, unit2]), enabling fast reverse lookup when an upstream entity changes.

Change detection mechanisms :

Binlog trigger: listens to MySQL binlog, extracts changed entity IDs, and uses the inverted index to find affected unit IDs. Provides high timeliness and old‑value information.

Recent‑scan trigger: relies on a mandatory mtime column to detect recent modifications. Simpler but less timely and cannot detect hard deletes.

Frequency reduction (de‑duplication) strategies are introduced to avoid unnecessary rebuilds, including field‑level filtering, hierarchical throttling, state‑based filtering, and low‑priority batching.

Integrated workflow : The article presents a combined pipeline where the inverted relationship table is built during full indexing, updated during incremental indexing, and consulted by both binlog and scan triggers to drive selective unit rebuilds.

Future outlook : Incremental construction dramatically reduces database load, network traffic, and latency, but challenges remain such as simplifying the consumption of mixed full‑plus‑incremental material and further automating relationship capture.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

data pipelineBackend DevelopmentDatabase Optimizationadvertising systemincremental indexing
Bilibili Tech
Written by

Bilibili Tech

Provides introductions and tutorials on Bilibili-related technologies.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.