Flink Table Store v0.2: Application Scenarios, Core Features, and Future Roadmap
This article introduces Flink Table Store v0.2, explains its four primary application scenarios—offline warehouse acceleration, partial update, pre‑aggregation rollup, and real‑time warehouse enhancement—details the core lake‑storage architecture, bucket management, append‑only mode, and outlines the project’s future roadmap and trade‑off considerations.
Introduction – Flink Table Store v0.2 is a lake‑storage solution that stores massive data cost‑effectively without services, using Manifest files and LSM‑Tree buckets, and integrates with Kafka to support both batch and streaming writes.
Application Scenarios
Offline data‑warehouse acceleration: supports Flink Streaming writes and batch/OLAP queries from multiple engines (Hive, Spark, Trino). It offers real‑time updates with primary‑key, non‑primary‑key, and append‑only data.
Partial Update (COALESCE): enables wide‑table updates where only changed columns are written, suitable for large tables with PK‑based updates.
Pre‑aggregation Rollup: merge‑engine set to aggregation automatically aggregates columns (e.g., SUM, MAX) during writes, similar to Flink streaming agg functions.
Real‑time warehouse enhancement: dual storage (lake + Kafka Log System) provides hybrid back‑fill reads and enables low‑latency streaming consumption while keeping data queryable by batch engines.
Core Features
1. Lake‑storage structure – snapshot‑level transactions, scalable object‑storage, hierarchical Manifest management for TB‑PB data.
2. Partition internal structure – each partition contains multiple buckets, each bucket is an updatable LSM‑Tree, avoiding large single‑file rewrites.
3. Production‑grade improvements – Catalog support (metadata on filesystem or Hive Metastore) and ecosystem integration with Hive, Spark, Trino.
4. Bucket rescale – dynamically increase bucket count for new partitions or batch‑rescale existing partitions without rewriting old data.
5. Append‑Only mode – low write cost, Kafka‑like ordered reads, automatic compaction to merge small files.
Future Outlook
• Meet Flink SQL storage needs: message‑queue semantics, OLAP queryability, batch ETL, and dimension‑lookup support.
• Trade‑off balancing among freshness, cost, and query latency, allowing users to prioritize based on workload.
• Architectural vision: lake storage + DFS + Log System, with a future Service layer for accelerated streaming pipelines and long‑term batch queries.
• Dim‑Join capability: point‑lookup cache layered from memory to local disk to DFS, enabling compute‑storage separation.
Project Information
Flink Table Store is an Apache Flink sub‑project. Source code: https://github.com/apache/flink-table-store . Documentation: https://nightlies.apache.org/flink/flink-table-store . Mailing lists: [email protected], [email protected], [email protected].
Q&A Highlights
Data permissions rely on underlying file‑system or Hive Metastore; no table‑level ACL yet.
Kerberos support is available via Flink.
Compared with Hudi/Iceberg, Table Store offers PK‑based and PK‑less updates, higher throughput, low‑cost real‑time updates, and stateless writes.
LSM snapshots enable incremental reads similar to Iceberg.
Roadmap includes a Service version to provide fresh, integrated online‑offline capabilities.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.