How Cisco Migrated from Pinot to StarRocks and Boosted Query Performance by Up to 70%
This article details Cisco Webex's migration from a complex Pinot‑Trino OLAP stack to StarRocks, covering the challenges of the legacy system, the step‑by‑step migration process—including storage, compute, and SQL dialect transformation—and the resulting performance gains, cost reductions, and operational improvements.
Background and Motivation
Cisco Webex relied on an intricate OLAP stack built around Apache Pinot for low‑latency real‑time queries and Trino for complex joins and sub‑queries. The stack suffered from high maintenance costs, limited functionality (no multi‑table joins, sub‑queries, or materialized views), poor data freshness, and fragmented user experience.
Key Challenges of the Existing Stack
High operational overhead and complex monitoring due to dual engines (Pinot + Trino).
Pinot lacked support for joins, sub‑queries, and materialized views, forcing reliance on Trino.
Data back‑fill was difficult because Pinot did not support partitioning.
Limited DML capabilities (no INSERT/UPDATE/DELETE) made data correction cumbersome.
Inconsistent resource isolation and tenant management.
Migration Goals
Achieve superior query performance and support for complex SQL (joins, sub‑queries, materialized views).
Provide robust handling of semi‑structured data (Flat JSON, Variant).
Reduce storage costs and improve disk utilization.
Unify query experience across teams.
Enable automatic scaling and fine‑grained resource isolation.
Migration Path and Practices
The team adopted a two‑pronged approach: moving to a store‑compute‑separated architecture and, where needed, a store‑compute‑integrated deployment, both powered by StarRocks.
Store‑Compute Separation
StarRocks was deployed on Kubernetes with Horizontal Pod Autoscaler (HPA) monitoring CPU and memory, allowing dynamic scaling of compute pods. Resource isolation was achieved using Rack labels to group nodes per business unit, ensuring that heavy workloads in one service did not impact others.
Store‑Compute Integration
For latency‑critical workloads, StarRocks' native MPP engine provided high‑performance query execution without the need for an external compute layer.
SQL Dialect Transformation
A custom Pinot Dialect Transformer automatically rewrote existing Pinot/Trino SQL to StarRocks syntax, covering over 70% of statements without manual changes. The transformer adjusts function names, argument orders, and supports future extensions.
Semi‑Structured Data Handling
StarRocks introduced Flat JSON and Variant data types. Flat JSON reduced disk usage by ~80% and, after table‑level configuration of sparsity and null factors, improved query latency. Variant provides efficient storage and query of dynamic JSON schemas, with metadata and value separation for fast access.
Performance Improvements
~70% of queries run faster on StarRocks than on Trino.
Average query latency improved by ~50% (up to 21% on cache‑hit runs).
Materialized view usage yielded >10× performance gains.
Flat JSON reduced storage footprint by ~80% and cut query latency by a similar margin after bucket‑key and sort‑key optimizations.
Operational Enhancements
Unified permission management via Apache Ranger and LDAP, integrated with UDP Auth.
Automatic backup & restore, and Sync Tool for cross‑cluster data migration.
Enhanced indexing: new tokenize function for debugging inverted indexes, and extended match operators (MATCH_ALL, MATCH_ANY) with push‑down support.
Future Roadmap
Query Insight: richer profiling and automated optimization suggestions.
Enhanced semi‑structured support: continue improving Variant shredding and Flat JSON indexing.
Text search optimization: expand inverted index capabilities, explore new engines (Tantivy, native StarRocks search).
StarRocks
StarRocks is an open‑source project under the Linux Foundation, focused on building a high‑performance, scalable analytical database that enables enterprises to create an efficient, unified lake‑house paradigm. It is widely used across many industries worldwide, helping numerous companies enhance their data analytics capabilities.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
