ScopeDB: Real-Time Data Analytics Solution for the Cloud‑Native Era
ScopeDB introduces a cloud‑native, real‑time analytics database that combines structured core columns with a flexible JSON column, adaptive indexing, a custom query language (ScopeQL), and true compute‑storage separation, delivering sub‑second query latency, high throughput, and up to 70% cost reduction compared with traditional big‑data stacks.
Introduction
ScopeDB is a next‑generation cloud‑native database for real‑time data analysis, presented by co‑founder tison (Apache Software Foundation board member). Its hallmark feature, “Schema On The Fly,” enables rapid adaptation to evolving business requirements.
Data Modeling Challenges and Breakthrough
Traditional relational databases require predefined schemas and DDL approvals, which become cumbersome in agile environments. Pure schemaless systems such as MongoDB and Elasticsearch avoid schema definition but suffer from inconsistent field naming and data‑swamp issues. ScopeDB balances these extremes by keeping stable core columns (e.g., timestamp, id, message) while providing a semi‑structured JSON column for flexible, evolving attributes.
Flexible Schema and Adaptive Indexing
ScopeDB supports primitive types (Integer, String) and semi‑structured types (Array, Object). It introduces an adaptive indexing mechanism that can create indexes on arbitrary expressions inside JSON data, eliminating the need to parse entire documents at query time. The system offers expression indexes, materialized indexes, cache indexes, as well as point, range, and search indexes, and it can automatically recommend optimal indexes based on observed query patterns.
Traditional Big‑Data Cloud‑Native Pain Points
Conventional pipelines route data through multiple stages—OLTP/Kafka → CDC/Fivetran → Flink/Iceberg → data warehouse—resulting in long latency, schema drift, and tightly coupled compute‑storage resources. ScopeDB redesigns the flow: backend services write directly to ScopeDB via HTTP API and the native query language ScopeQL, removing intermediate Kafka layers, preserving raw data, and reducing end‑to‑end latency to seconds.
True Cloud‑Native Architecture
Built from scratch on object storage (S3, GCS, Azblob), ScopeDB achieves genuine compute‑storage separation and read/write separation. Nodes are stateless, can be added in seconds without data rebalancing, and benefit from multi‑AZ fault tolerance inherent to object storage, eliminating the need for explicit Raft or other replication protocols. The “no partition‑owner” design allows any node to serve any data, eradicating load imbalance.
Three‑Tier Compute Isolation
The architecture separates compute into three independent groups: Ingestion Group (up to 10 GiB/s per node on a 4‑core, 16 GB instance), Regular Group for routine dashboards and log queries, and Temporary Group for bursty, heavy‑weight queries that are released after completion. Compared with Apache Doris, ScopeDB delivers superior performance while reducing total cost of ownership by roughly 70%.
Real‑Time Analysis and Unified Query Language
Most queries return within 10 seconds, and data freshness is measured in seconds, enabling use cases such as live observability, user‑behavior monitoring, and fraud detection. ScopeQL, a custom query language, follows a top‑down pipeline style, offers orthogonal composability, expression reuse, and unified read/write semantics while remaining grounded in relational algebra.
Production Practice and Results
A production customer ingests over 100 billion events per day, sustaining a write throughput of 140 k rows/s (peak 180 k rows/s). Query latency statistics are P50 = 142 ms, P90 = 454 ms, P97 = 835 ms, with the majority of queries completing under one second. Intelligent routing directs different query types to dedicated compute slots, and dynamic elastic scaling releases temporary resources during low‑traffic periods, further cutting costs.
Summary of Advantages
ScopeDB’s five core benefits are: (1) elimination of ETL pipelines, (2) flexibility via Schema On The Fly, (3) a unified language (ScopeQL) that reduces learning and maintenance overhead, (4) strong isolation between ingestion and query workloads, and (5) cloud elasticity that provides pay‑as‑you‑go scaling and up to 70% cost savings. The solution exemplifies a pivotal direction for database evolution in the cloud‑native, real‑time analytics era.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
