Applying Apache Spark in Guanyuan Self-Service Analytics System: Architecture, Challenges, and Solutions
This presentation details how Guanyuan Data leverages Apache Spark within its self‑service analytics platform, covering product features, flexible deployment, resource isolation, performance challenges, architectural solutions, and future cloud‑native enhancements to support thousands of users and massive query workloads.
Introduction: The talk by Zhou Xiang, a R&D engineer at Guanyuan Data, introduces the Guanyuan self‑service analytics product and its growing role for business users.
Product overview: Features include form filling, data ingestion, dashboards, portals, ETL, lightweight apps, visual analysis, complex reports, and emphasize visual analysis and smart ETL for business users.
Architecture: The system integrates Apache Spark as the core compute engine, with a control tower for task dispatch, Delta Lake storage, and supports various deployment modes (single‑machine, SaaS, private, cloud) via Docker/K8s.
Challenges: Flexible deployment, multi‑tenant resource isolation, high‑performance low‑latency queries, Spark stability, optimizer overhead, join memory usage, shuffle resource pressure, task cancellation, and overall query experience.
Solutions: Containerized deployment, Spark‑based architecture, storage‑compute separation, support for multiple storage backends, dynamic resource isolation, engine segregation for slow queries, optimizer rule tuning, query validation, shuffle cleanup, and monitoring.
Performance: The platform serves up to 30 000 monthly active users, maintains 90 percent of queries under 2 seconds, processes over 300 000 daily tasks, and runs on clusters up to 20 000 cores.
Future outlook: Plans include more cloud‑native solutions, integration with other engines such as Databricks and ClickHouse, and continued contribution to the open‑source community.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.