Databases 14 min read

ScopeDB: Real-Time Data Analytics Solution for the Cloud‑Native Era

ScopeDB introduces a cloud‑native, real‑time analytics database that combines structured core columns with a flexible JSON column, adaptive indexing, a custom query language (ScopeQL), and true compute‑storage separation, delivering sub‑second query latency, high throughput, and up to 70% cost reduction compared with traditional big‑data stacks.

DataFunSummit
DataFunSummit
DataFunSummit
ScopeDB: Real-Time Data Analytics Solution for the Cloud‑Native Era

Introduction

ScopeDB is a next‑generation cloud‑native database for real‑time data analysis, presented by co‑founder tison (Apache Software Foundation board member). Its hallmark feature, “Schema On The Fly,” enables rapid adaptation to evolving business requirements.

Data Modeling Challenges and Breakthrough

Traditional relational databases require predefined schemas and DDL approvals, which become cumbersome in agile environments. Pure schemaless systems such as MongoDB and Elasticsearch avoid schema definition but suffer from inconsistent field naming and data‑swamp issues. ScopeDB balances these extremes by keeping stable core columns (e.g., timestamp, id, message) while providing a semi‑structured JSON column for flexible, evolving attributes.

Flexible Schema and Adaptive Indexing

ScopeDB supports primitive types (Integer, String) and semi‑structured types (Array, Object). It introduces an adaptive indexing mechanism that can create indexes on arbitrary expressions inside JSON data, eliminating the need to parse entire documents at query time. The system offers expression indexes, materialized indexes, cache indexes, as well as point, range, and search indexes, and it can automatically recommend optimal indexes based on observed query patterns.

Traditional Big‑Data Cloud‑Native Pain Points

Conventional pipelines route data through multiple stages—OLTP/Kafka → CDC/Fivetran → Flink/Iceberg → data warehouse—resulting in long latency, schema drift, and tightly coupled compute‑storage resources. ScopeDB redesigns the flow: backend services write directly to ScopeDB via HTTP API and the native query language ScopeQL, removing intermediate Kafka layers, preserving raw data, and reducing end‑to‑end latency to seconds.

True Cloud‑Native Architecture

Built from scratch on object storage (S3, GCS, Azblob), ScopeDB achieves genuine compute‑storage separation and read/write separation. Nodes are stateless, can be added in seconds without data rebalancing, and benefit from multi‑AZ fault tolerance inherent to object storage, eliminating the need for explicit Raft or other replication protocols. The “no partition‑owner” design allows any node to serve any data, eradicating load imbalance.

Three‑Tier Compute Isolation

The architecture separates compute into three independent groups: Ingestion Group (up to 10 GiB/s per node on a 4‑core, 16 GB instance), Regular Group for routine dashboards and log queries, and Temporary Group for bursty, heavy‑weight queries that are released after completion. Compared with Apache Doris, ScopeDB delivers superior performance while reducing total cost of ownership by roughly 70%.

Real‑Time Analysis and Unified Query Language

Most queries return within 10 seconds, and data freshness is measured in seconds, enabling use cases such as live observability, user‑behavior monitoring, and fraud detection. ScopeQL, a custom query language, follows a top‑down pipeline style, offers orthogonal composability, expression reuse, and unified read/write semantics while remaining grounded in relational algebra.

Production Practice and Results

A production customer ingests over 100 billion events per day, sustaining a write throughput of 140 k rows/s (peak 180 k rows/s). Query latency statistics are P50 = 142 ms, P90 = 454 ms, P97 = 835 ms, with the majority of queries completing under one second. Intelligent routing directs different query types to dedicated compute slots, and dynamic elastic scaling releases temporary resources during low‑traffic periods, further cutting costs.

Summary of Advantages

ScopeDB’s five core benefits are: (1) elimination of ETL pipelines, (2) flexibility via Schema On The Fly, (3) a unified language (ScopeQL) that reduces learning and maintenance overhead, (4) strong isolation between ingestion and query workloads, and (5) cloud elasticity that provides pay‑as‑you‑go scaling and up to 70% cost savings. The solution exemplifies a pivotal direction for database evolution in the cloud‑native, real‑time analytics era.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Cloud NativeReal-time analyticsdatabaseadaptive indexingschema on the flyScopeDBScopeQL
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.