Databases 13 min read

How TeleDB Evolved from Centralized to Native Distributed Architecture

TeleDB’s journey from a centralized MySQL/PostgreSQL‑based system to a native distributed HTAP database showcases innovations such as share‑nothing architecture, columnar storage, vectorized execution, Remote Data Access, global caching, and advanced dead‑lock detection, dramatically improving query performance, storage efficiency, and scalability.

ITPUB
ITPUB
ITPUB
How TeleDB Evolved from Centralized to Native Distributed Architecture

Background and Motivation

Real‑time analytics on petabyte‑scale data exposed the limits of traditional databases: complex queries exhaust resources, execution engines are inefficient, and storage costs keep rising. TeleDB was built as a share‑nothing HTAP database to address these challenges.

Architecture Evolution

TeleDB progressed through three stages:

Centralized stage – built on optimized MySQL/PostgreSQL for rapid adoption.

Middleware‑based distributed stage – introduced a distributed layer to overcome single‑node bottlenecks.

Native distributed stage – a fully share‑nothing architecture that supports both OLTP scalability and complex analytical queries with high compatibility and low migration cost.

Core Technical Innovations

Share‑nothing distributed execution – eliminates single‑point bottlenecks and scales horizontally across nodes.

Remote Data Access (RDA) – a transport layer that streams binary tuples via shared memory and forwarder processes, reducing the number of processes per node from M×N to 1 for a query with M join operators on N data nodes.

Distributed Dead‑lock Detection (DDS) – detects and resolves cross‑node deadlocks within seconds, improving resource utilization.

Global Cache – unified caching for execution plans and metadata, cutting overall memory usage by 53% and increasing system stability.

Columnar Storage (Pax Access Method)

TeleDB extends PostgreSQL’s tableam interface with a Pax Access Method (Pax AM). The storage stack consists of:

Pax Write/Read State Machines for columnar writes, reads, indexing and expression push‑down.

Pax File interface that provides read/write access to columnar files.

Pax Meta Table that maps logical columnar tables to their physical files.

BatchTupleSlot, which batches tuples to minimise engine intrusion.

Small‑File Merging and Cold‑Hot Data Management

A background worker called File Meta Manager periodically merges tiny columnar files based on metadata in the Pax Meta Table and user‑defined policies. After merging, it updates the meta table and removes obsolete files. The File Storage Manager moves merged files that satisfy cold‑data criteria to external object storage, while hot data remains on local disks. An XBlock cache abstracts differences among object‑storage protocols to provide low‑latency access.

Vectorized Execution Engine

TeleDB introduces a Vector Executor that implements vectorized operators such as Sort, Agg, Filter and Project. The executor works with both Pax AM (columnar) and Heap AM (row‑based) through a unified interface, allowing gradual replacement of classic operators. SIMD instructions on ARM and x86 accelerate hash‑agg and group‑by workloads, delivering order‑of‑magnitude speedups.

Additional Optimizations

MinMax indexes to prune data early.

Dynamic partition pruning to avoid scanning irrelevant partitions.

Runtime filters for hash joins to reduce data movement.

Performance Highlights

Combining columnar storage, vectorized execution, RDA, global caching and DDS yields:

Complex analytical queries run orders of magnitude faster.

Memory consumption drops by >50% (global cache reduces usage by 53%).

Process/thread count during distributed joins is dramatically reduced, lowering CPU context‑switch overhead.

Future Roadmap

TeleDB will evolve into a multi‑engine autonomous HTAP platform with AI‑driven operations, vector storage for AI workloads, and striped storage for higher elasticity. Planned enhancements include seamless load‑balancing, read‑write separation, and deeper integration with external data lakes via FDW plugins.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

distributed databaseHTAPColumnar StorageVectorized ExecutionTeleDB
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.