How Vivo Scaled Marketing Automation with Presto, Bitmap, and StarRocks
This case study details how Vivo’s marketing automation platform evolved its data‑driven architecture—from a Presto‑based wide‑table design, through a Bitmap optimization, to a StarRocks migration—addressing performance bottlenecks, reducing resource costs, and enhancing data security.
Business Background
To increase user lifetime value in a saturated market, fine‑grained, data‑driven marketing is required. The digital store platform collects three major data domains: user tags, event logs, and store LBS (location‑based service) data.
Infrastructure
Data volume reaches billions: tag records >5 billion, store LBS records >9 billion. MySQL cannot handle real‑time multi‑table calculations at this scale, so an OLAP engine (Presto) on a Hive data warehouse was adopted.
Architecture consists of three layers:
Warehouse layer: Hive stores integrated tag, event, store LBS, and custom crowd packages.
Compute layer: Presto executes SQL against Hive for real‑time calculations.
Business layer: Provides audience selection, crowd solidification, and channel distribution logic.
Architecture Iteration
Key pain points of the Presto‑wide‑table solution:
Tag onboarding required a two‑day turnaround.
Wide tables grew beyond 300 columns, increasing maintenance cost.
Combined tag and store LBS queries exhibited minute‑level latency.
Bitmap Solution
Design : A pre‑computed auto‑increment ID per user is used as a bitmap index. Tag column values are transformed into bitmap rows, each row storing a compressed bitmap of all users for a given tag.
To integrate DMP tag services, a Presto connector plugin was built and a UDF plugin added a “Select In Bitmap” capability.
Example query:
select count(user_id) from user_id_mapping where day='${day}' and user_rn in (select bitmap from dmp.virtual_table where rule='#tag_rule');Effect : Tag onboarding time reduced from 1.5 person‑days to 0.5 person‑days; query P90 improved to 38 s, and pure tag queries achieve millisecond response.
Limitations of Presto + Bitmap:
Complex multi‑table joins and aggregations strain performance and memory.
Presto cannot write encrypted Hive tables, blocking data‑security compliance.
StarRocks Migration
StarRocks was evaluated for its integrated compute‑storage architecture, vectorized execution engine, and built‑in security features. A phased migration plan (simple→complex, read‑first→write‑first) was executed.
After migration, compute and warehouse are unified in a single StarRocks layer.
Effect :
Resource cost reduced by ~93% (53 Presto nodes → 3 StarRocks nodes).
Query P95 time dropped from 65 s to 6 s.
Data security improved: no HDFS files, encryption functions available, with future automatic encryption planned.
Post‑migration Issues and Resolutions
Issue 1 – Row limit : StarRocks’ sql_select_limit variable capped result sets at 1 M rows, causing marketing SMS tasks to truncate. The variable was increased and the cluster restarted, removing the limitation.
Issue 2 – Full GC : Queries that loaded entire result sets into memory triggered long Full GC pauses and request timeouts. Enabling streaming query mode via the MySQL JDBC driver (setting fetchSize and useCursorFetch=true) prevented full result‑set materialization and eliminated the GC spikes.
Conclusion
The architecture evolved from a Presto‑based wide‑table design to a bitmap‑enhanced Presto solution, and finally to a StarRocks deployment. This progression delivered faster tag onboarding, second‑level query latency, a 93% reduction in cluster resources, and stronger data‑security capabilities, meeting the demands of large‑scale, real‑time marketing analytics.
vivo Internet Technology
Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
