Implementing Real‑Time Materialized Views to Accelerate Large‑Scale Time‑Series Queries
This article explains how to implement real‑time materialized views to accelerate large‑scale time‑series data queries, covering the need for materialized views, their definition, storage, incremental updates, pre‑computation, query partitioning, performance testing, and future directions.
The article introduces the concept of real‑time materialized views as a technique to speed up queries on massive time‑series datasets, emphasizing the growing volume of daily generated data and the resulting challenges in query performance.
It explains why materialized views are needed, describing common query acceleration methods such as caching, parallel and distributed computing, data partitioning, indexing, and pre‑computation, with materialized views being a key form of pre‑computed results.
A materialized view is defined as a pre‑computed result set of complex or costly queries, allowing fast retrieval without re‑executing the original computation. The article highlights the difficulty of keeping materialized data up‑to‑date for ever‑growing time‑series data and proposes an incremental update approach that leverages the append‑only nature of such data.
The implementation details are presented for a product (炎凰) that handles observability data. Data is split into time‑based shards, each shard is pre‑computed into a materialized shard, and queries combine these shards using a Map‑Reduce‑like process: Map performs per‑shard pre‑computation, Reduce merges the partial results.
Key implementation points include storage design (structuring materialized tables and appropriate indexing), update strategies (periodic or event‑driven incremental updates), and pre‑computation logic that retains SUM and COUNT for later aggregation. Time‑bucket granularity is introduced to preserve temporal information in pre‑computed results.
The query execution is divided into four parts (P1‑P4) based on whether data is fully materialized, partially materialized, or not materialized, and whether it falls within complete time buckets. Fully materialized and bucketed data are read directly, while non‑materialized or partially bucketed data require on‑the‑fly aggregation.
SQL DDL examples show how to create materialized views with parameters such as MATERIALIZED_ONLY and TIME_BUCKET, and how to use SHOW MATERIALIZED VIEW to monitor progress. The materialized data is stored in Parquet format for efficient I/O.
Performance tests demonstrate significant query speed improvements when using hourly or daily time‑bucketed materialized views, with daily buckets performing best for one‑day query windows.
The article concludes with future directions, including intelligent routing of queries to materialized views, hierarchical materialization, and treating materialized view maintenance as an ETL process, while noting the trade‑offs of storage overhead and maintenance cost.
A short Q&A addresses the applicability of materialized views to reporting, the proprietary implementation used, and constraints on aggregation functions.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.