Industry Insights 11 min read

How Vivo Scaled Marketing Automation with Presto, Bitmap, and StarRocks

This case study details how Vivo’s marketing automation platform evolved its data‑driven architecture—from a Presto‑based wide‑table design, through a Bitmap optimization, to a StarRocks migration—addressing performance bottlenecks, reducing resource costs, and enhancing data security.

vivo Internet Technology
vivo Internet Technology
vivo Internet Technology
How Vivo Scaled Marketing Automation with Presto, Bitmap, and StarRocks

Business Background

To increase user lifetime value in a saturated market, fine‑grained, data‑driven marketing is required. The digital store platform collects three major data domains: user tags, event logs, and store LBS (location‑based service) data.

Infrastructure

Data volume reaches billions: tag records >5 billion, store LBS records >9 billion. MySQL cannot handle real‑time multi‑table calculations at this scale, so an OLAP engine (Presto) on a Hive data warehouse was adopted.

Architecture consists of three layers:

Warehouse layer: Hive stores integrated tag, event, store LBS, and custom crowd packages.

Compute layer: Presto executes SQL against Hive for real‑time calculations.

Business layer: Provides audience selection, crowd solidification, and channel distribution logic.

Architecture Iteration

Key pain points of the Presto‑wide‑table solution:

Tag onboarding required a two‑day turnaround.

Wide tables grew beyond 300 columns, increasing maintenance cost.

Combined tag and store LBS queries exhibited minute‑level latency.

Bitmap Solution

Design : A pre‑computed auto‑increment ID per user is used as a bitmap index. Tag column values are transformed into bitmap rows, each row storing a compressed bitmap of all users for a given tag.

To integrate DMP tag services, a Presto connector plugin was built and a UDF plugin added a “Select In Bitmap” capability.

Example query:

select count(user_id) from user_id_mapping where day='${day}' and user_rn in (select bitmap from dmp.virtual_table where rule='#tag_rule');

Effect : Tag onboarding time reduced from 1.5 person‑days to 0.5 person‑days; query P90 improved to 38 s, and pure tag queries achieve millisecond response.

Limitations of Presto + Bitmap:

Complex multi‑table joins and aggregations strain performance and memory.

Presto cannot write encrypted Hive tables, blocking data‑security compliance.

StarRocks Migration

StarRocks was evaluated for its integrated compute‑storage architecture, vectorized execution engine, and built‑in security features. A phased migration plan (simple→complex, read‑first→write‑first) was executed.

After migration, compute and warehouse are unified in a single StarRocks layer.

Effect :

Resource cost reduced by ~93% (53 Presto nodes → 3 StarRocks nodes).

Query P95 time dropped from 65 s to 6 s.

Data security improved: no HDFS files, encryption functions available, with future automatic encryption planned.

Post‑migration Issues and Resolutions

Issue 1 – Row limit : StarRocks’ sql_select_limit variable capped result sets at 1 M rows, causing marketing SMS tasks to truncate. The variable was increased and the cluster restarted, removing the limitation.

Issue 2 – Full GC : Queries that loaded entire result sets into memory triggered long Full GC pauses and request timeouts. Enabling streaming query mode via the MySQL JDBC driver (setting fetchSize and useCursorFetch=true) prevented full result‑set materialization and eliminated the GC spikes.

Conclusion

The architecture evolved from a Presto‑based wide‑table design to a bitmap‑enhanced Presto solution, and finally to a StarRocks deployment. This progression delivered faster tag onboarding, second‑level query latency, a 93% reduction in cluster resources, and stronger data‑security capabilities, meeting the demands of large‑scale, real‑time marketing analytics.

performance optimizationBig DataStarRocksBitmapOLAPPrestodata architecture
vivo Internet Technology
Written by

vivo Internet Technology

Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.