Inside PostgreSQL: How the Database Kernel Works and Can Be Extended
This article explains PostgreSQL's kernel architecture, from client connection handling through parsing, rewriting, planning, and execution, illustrates scan and join algorithms, shows how to extend the kernel and build clustered deployments, and introduces Foreign Data Wrappers for integrating external data sources.
Speaker
Jiang Ruihai, Chief Kernel Architect at Shandong Hango Database, has nearly 20 years of experience in database kernel and cluster development, previously working at Lucent and IBM.
Presentation Overview
The talk is divided into three parts: the journey of domestic databases, analysis of PostgreSQL kernel internals, and extending PostgreSQL with new features.
Getting Started with PostgreSQL Development
Useful resources include the official source repository, documentation, and mailing lists. PostgreSQL is written in C; tools such as Source Insight, cscope, Eclipse, Flex, Bison, and gdb are recommended. Linux distributions like CentOS or Ubuntu provide a convenient build and debugging environment.
PostgreSQL Kernel Architecture
When a client connects, the postmaster process creates a backend process. The query then passes through four core modules:
SQL Parser – lexical and syntax analysis.
Query Rewriter – rewrites views and applies rules.
Planner – cost‑based optimizer that selects scan and join strategies.
Executor – carries out the chosen plan.
Execution methods include Sequential Scan and Index Scan; the planner chooses the cheapest based on estimated costs.
Join algorithms demonstrated are Nested Loop Join and Hash Join, with examples for two‑table and three‑table joins.
SQL Request Processing Steps
1. Parser – converts SQL text into a raw parse tree (e.g., SelectStmt).
2. Rewriter – expands views into subqueries and applies rules such as logging triggers.
3. Planner – builds a query‑plan tree, selecting scan and join algorithms based on cost estimates.
4. Executor – walks the plan tree, pulling rows from leaf operators (sequential or index scans) and feeding them to join operators (nested‑loop, merge, hash) until the final result is produced.
Extending PostgreSQL Kernel
Beyond understanding, developers can modify the kernel to add new functionality, such as custom data types, operators, or planner hooks, enabling tailored extensions.
PostgreSQL Cluster Architecture
Clustered PostgreSQL achieves horizontal scalability by adding coordinator nodes and data nodes. Coordinators handle client sessions, generate distributed query plans that consider data placement, and dispatch sub‑plans to data nodes, which execute locally and return results.
The planner on the coordinator is extended to account for data distribution, and the executor can run both local and remote sub‑plans.
Foreign Data Wrappers (FDW)
FDW allows PostgreSQL to treat external data sources as foreign tables. By implementing the FDW API—especially planner‑related callbacks—developers can integrate virtually any external system; dozens of existing wrappers are available.
About DTCC
The China Database Technology Conference (DTCC) is the largest domestic event for database and big‑data technologies, gathering experts to discuss MySQL, NoSQL, Oracle, caching, cloud databases, AI, security, data governance, and more.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
