Introduction to Apache Paimon: Architecture, Unified Storage, and Core Concepts
This article introduces Apache Paimon, an open‑source table format that supports batch and streaming reads and writes, explains its architecture, unified storage model, and core concepts such as file layout, snapshots, manifests, data files, partitions, and consistency guarantees.
1. Understanding Paimon
Apache Paimon is an emerging open‑source table format that supports both batch and streaming reads and writes, enabling OLAP queries on large‑scale data.
Its architecture allows reading from historical snapshots, latest offsets, or a hybrid incremental snapshot, and writing via CDC streams or bulk inserts.
The ecosystem integrates with Apache Flink, Hive, Spark, Trino and other compute engines.
Internally, Paimon stores columnar files on a file system or object storage, keeps metadata in manifest files for efficient pruning, and uses an LSM‑tree for primary‑key tables to support high‑performance updates.
2. Unified Storage
For stream engines like Flink, three connector types are typical: message queues (e.g., Kafka) for low‑latency ingestion, OLAP systems (e.g., ClickHouse) for ad‑hoc queries, and batch stores (e.g., Hive) for traditional batch operations.
Paimon provides a table abstraction that behaves like a Hive table in batch mode and like a never‑expiring message queue in streaming mode.
3. Core Concepts
1. File Layout
All files of a table reside under a base directory and are organized hierarchically, allowing recursive access from snapshot files.
2. Snapshot
Snapshot files (JSON) reside in a snapshot directory and record the active schema and a list of data‑file manifests, enabling point‑in‑time reads and time‑travel queries.
3. Manifest Files
Manifests and manifest lists are stored in a manifest directory; they enumerate LSM data files and change‑log files associated with each snapshot.
4. Data Files
Data files are partitioned and can be stored in ORC (default), Parquet, or Avro formats.
5. Partitions
Paimon adopts the same partition concept as Apache Hive, allowing optional partition keys (e.g., date, city) to improve query efficiency.
6. Consistency Guarantees
Writes use a two‑phase commit protocol, producing up to two snapshots per commit; concurrent writers on different partitions can commit in parallel, while writers on the same partition receive snapshot isolation.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
