Optimizing Multi‑Dimensional User Count Computation in Feed Using Data Tagging
By deduplicating logs and assigning compact numeric tags to each user‑dimension combination, the data‑tagging method replaces costly lateral‑view expansions with a user‑level aggregation, cutting shuffle volume from terabytes to gigabytes and reducing runtime from 49 minutes to 14 minutes, enabling scalable multi‑dimensional user‑count analysis for Baidu Feed.