Unlock PostgreSQL’s Multi‑Model Power: Graph, Vector, Full‑Text, JSONB & GIS in Practice
PostgreSQL’s extensible architecture lets it evolve from a relational DB into a true multi‑model system supporting graphs, vectors, full‑text, time‑series, GIS and key‑value data, with detailed architecture explanations, practical code demos, cross‑model queries, extension selection, and production‑grade best‑practice tips.
Why PostgreSQL Is a Multi‑Model Database
PostgreSQL is renowned as a powerful open‑source relational database, but its extensible ecosystem allows it to become a genuine multi‑model system that simultaneously supports relational, graph, vector search, full‑text, time‑series, spatial‑geographic and key‑value data types.
Core Architecture Enabling Multi‑Modality
The core storage engine is built on heap tables and MVCC (Multi‑Version Concurrency Control). Because any table can define its own Access Method , PostgreSQL can store arbitrary data structures, providing the architectural foundation for multi‑model capabilities.
Multi‑Model Capability Overview
Relational – built‑in heap tables; indexes: B‑Tree, Hash
Graph – Apache AGE extension; index: B‑Tree
Vector – pgvector extension; indexes: IVFFlat, HNSW
Full‑Text – built‑in tsvector/tsquery; indexes: GIN, GiST
Document (JSONB) – JSONB column; index: GIN
Key‑Value (hstore) – hstore column; index: GIN
Time‑Series – TimescaleDB extension; indexes: BRIN, B‑Tree
Spatial GIS – PostGIS extension; index: GIST (R‑Tree)
The core advantage is a single SQL engine and query syntax that can operate on N data models without synchronising or converting data between separate databases.
Detailed Extension Walk‑Throughs
3.1 Graph – Apache AGE
Positioning: Provides a property‑graph model on PostgreSQL, allowing Cypher queries alongside SQL for social‑network, recommendation and risk‑analysis scenarios that require multi‑hop traversals.
Principle: AGE stores graph data as edge and vertex tables; the cypher() function bridges SQL and Cypher, using recursive CTEs or native graph algorithms for multi‑hop queries. Compared with Neo4j, AGE integrates seamlessly with the PostgreSQL ecosystem and enables cross‑modal queries (graph + vector + time‑series).
-- Load extension
CREATE EXTENSION IF NOT EXISTS age;
LOAD 'age';
SET search_path = ag_catalog, "$user", public;
-- Create a graph
SELECT create_graph('social_graph');
-- Insert vertices (users)
SELECT * FROM cypher('social_graph', $$
CREATE (alice:Person {id: 1, name: 'Alice', age: 30, city: '北京'}),
(bob:Person {id: 2, name: 'Bob', age: 28, city: '上海'}),
(charlie:Person {id: 3, name: 'Charlie', age: 35, city: '北京'}),
(david:Person {id: 4, name: 'David', age: 32, city: '深圳'}),
(emma:Person {id: 5, name: 'Emma', age: 29, city: '北京'});
$$) AS (result agtype);
-- Insert edges (FOLLOWS)
SELECT * FROM cypher('social_graph', $$
MATCH (a:Person {name: 'Alice'}), (b:Person {name: 'Bob'})
CREATE (a)-[:FOLLOWS {since: '2026-01-01'}]->(b);
$$) AS (result agtype);
-- Single‑hop query: who does Alice follow?
SELECT * FROM cypher('social_graph', $$
MATCH (alice:Person {name: 'Alice'})-[:FOLLOWS]->(p:Person)
RETURN p.name, r.since;
$$) AS (name TEXT, since TEXT);
-- Two‑hop query: friends of friends
SELECT * FROM cypher('social_graph', $$
MATCH (alice:Person {name: 'Alice'})-[:FOLLOWS*2]->(fof:Person)
RETURN alice.name AS user_name, fof.name AS friend_of_friend;
$$) AS (user_name TEXT, friend_of_friend TEXT);
-- Common‑friend query
SELECT * FROM cypher('social_graph', $$
MATCH (alice:Person {name: 'Alice'})-[:FOLLOWS]->(p),
(bob:Person {name: 'Bob'})-[:FOLLOWS]->(p)
RETURN p.name;
$$) AS (common_friend TEXT);3.2 Full‑Text Search – tsvector / tsquery
Positioning: Built‑in full‑text engine, ideal for search, log analysis and content recommendation where precise keyword matching and relevance ranking are required.
Principle: Uses an inverted index. Documents are tokenised into tsvector, queries are expressed with tsquery, and relevance is scored via ts_rank() or ts_rank_cd().
-- Clean old config
DROP TEXT SEARCH CONFIGURATION IF EXISTS chinese CASCADE;
-- Install Chinese parser
CREATE EXTENSION IF NOT EXISTS zhparser;
-- Create Chinese text search configuration
CREATE TEXT SEARCH CONFIGURATION chinese (PARSER = zhparser);
ALTER TEXT SEARCH CONFIGURATION chinese ADD MAPPING FOR n, v, a, i, e, l WITH simple;
-- Create test table with generated tsvector column
CREATE TABLE articles (
id serial PRIMARY KEY,
title text,
content text,
author text,
created_at timestamptz DEFAULT now(),
search_vector tsvector GENERATED ALWAYS AS (
setweight(to_tsvector('chinese', coalesce(title, '')), 'A') ||
setweight(to_tsvector('chinese', coalesce(content, '')), 'B') ||
setweight(to_tsvector('chinese', coalesce(author, '')), 'C')
) STORED
);
-- Create GIN index on the vector
CREATE INDEX idx_articles_search ON articles USING gin (search_vector);
-- Insert sample data
INSERT INTO articles (title, content, author) VALUES
('PostgreSQL 多模数据库实战', '本文介绍 PostgreSQL 如何通过扩展支持向量检索、全文检索、时序数据、图数据等多模态能力。', 'DBA专家'),
('MySQL 高可用架构指南', 'MySQL 主从复制、MGR 集群、Galera Cluster 等高可用方案对比。', '数据库架构师');
-- AND query (both terms must appear)
SELECT id, title, author, ts_rank(search_vector, query) AS rank
FROM articles, to_tsquery('chinese', 'PostgreSQL & 多模 & 扩展') query
WHERE search_vector @@ query
ORDER BY rank DESC;
-- OR query (any term)
SELECT id, title, author, ts_rank(search_vector, query) AS rank
FROM articles, to_tsquery('chinese', 'PostgreSQL | MySQL') query
WHERE search_vector @@ query
ORDER BY rank DESC;
-- Prefix match example
SELECT id, title, author, ts_rank(search_vector, query) AS rank
FROM articles, to_tsquery('chinese', '数据:*') query
WHERE search_vector @@ query
ORDER BY rank DESC;
-- Highlight matching keywords
SELECT id, title, ts_headline('chinese', content, query) AS snippet
FROM articles, to_tsquery('chinese', 'PostgreSQL & 扩展') query
WHERE search_vector @@ query;
-- Term frequency statistics
SELECT word, ndoc
FROM ts_stat('SELECT search_vector FROM articles')
ORDER BY ndoc DESC
LIMIT 10;3.3 Vector Search – pgvector
Positioning: The core extension for AI/LLM workloads, providing a native vector(n) type and three index algorithms.
Principle & Index Algorithms: IVFFlat – approximate inverted index, fast build, suitable for millions of rows, recall ~95%, high QPS. HNSW – navigable small‑world graph, higher recall (~97%) and extreme QPS, but slower build; best for high‑dimensional data.
-- Install extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Create a vector table (3‑dim for demo)
CREATE TABLE embeddings (
id bigserial PRIMARY KEY,
content text,
embedding vector(3)
);
-- Insert example vectors
INSERT INTO embeddings (content, embedding) VALUES
('苹果是一种水果', '[0.1,0.22,0.33]'),
('香蕉也是水果', '[0.2,0.33,0.44]'),
('我喜欢吃苹果', '[0.11,0.23,0.34]'),
('今天天气真好', '[0.9,0.8,0.7]'),
('数据库性能优化', '[0.5,0.6,0.7]');
-- Similarity search (cosine distance)
SELECT id, content,
1 - (embedding <=> '[0.1,0.22,0.33]') AS similarity
FROM embeddings
ORDER BY embedding <=> '[0.1,0.22,0.33]'
LIMIT 5;
-- Create HNSW index (production recommendation)
CREATE INDEX ON embeddings USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 128);⚠️ Production note: ef_construction defaults to 64; use 128‑256 for better recall. m defaults to 16; increase to 32‑64 for high‑dimensional data.
3.4 Document Store – JSONB
Positioning: Schema‑less semi‑structured storage with fast GIN‑indexed path queries.
Principle: JSONB stores JSON in binary form, builds an internal key‑value structure, and supports GIN indexes for efficient queries. Compared with plain JSON, JSONB writes are slightly slower but queries are 3‑5× faster.
-- Create a table with a JSONB column
CREATE TABLE events (
id serial PRIMARY KEY,
event_type text,
payload jsonb,
created_at timestamptz DEFAULT now()
);
-- Insert JSONB documents
INSERT INTO events (event_type, payload) VALUES
('user_login', '{"user_id": 1001, "ip": "192.168.1.1", "device": "iOS"}'::jsonb),
('order_create', '{"order_id": "ORD20240315", "amount": 299.00, "items": 3}'::jsonb);
-- Create GIN index for path queries
CREATE INDEX idx_events_payload ON events USING gin (payload);
-- Containment query (fast, uses index)
SELECT * FROM events WHERE payload @> '{"device": "iOS"}';💡 JSONB vs JSON: JSON writes are marginally faster because no parsing is needed, but JSONB queries are dramatically faster. For any query‑heavy workload, prefer JSONB.
3.5 Key‑Value Store – hstore
Positioning: Simple key‑value pairs stored as text, offering higher write performance and smaller indexes than JSONB for flat structures.
-- Enable extension
CREATE EXTENSION IF NOT EXISTS hstore;
-- Create a table with hstore column
CREATE TABLE products (
id serial PRIMARY KEY,
name text,
attributes hstore
);
-- Insert key‑value rows
INSERT INTO products (name, attributes) VALUES
('iPhone 17', 'brand=>Apple, color=>蓝色, storage=>256GB, price=>6999'),
('Mate 80', 'brand=>华为, color=>黑色, storage=>512GB, price=>6499');
-- Query by attribute
SELECT * FROM products WHERE attributes @> 'brand=>Apple'::hstore;3.6 Time‑Series – TimescaleDB
Principle: Implements a hypertable abstraction that automatically partitions data by time (chunks) and provides continuous aggregates (materialised views) for fast range queries.
-- Install TimescaleDB
CREATE EXTENSION IF NOT EXISTS timescaledb;
-- Create a regular table and convert to hypertable
CREATE TABLE sensor_data (
time timestamptz NOT NULL,
sensor_id text,
temperature double precision,
humidity double precision
);
SELECT create_hypertable('sensor_data', 'time', chunk_time_interval => interval '1 day', migrate_data => true);
-- Continuous aggregate (hourly average)
CREATE MATERIALIZED VIEW sensor_hourly
WITH (timescaledb.continuous) AS
SELECT time_bucket('1 hour', time) AS hour,
sensor_id,
avg(temperature) AS avg_temp,
avg(humidity) AS avg_humidity
FROM sensor_data
GROUP BY 1,2;
-- Retention policy (keep 30 days of raw data)
SELECT add_retention_policy('sensor_data', interval '30 days');
-- Insert simulated data (200 rows)
INSERT INTO sensor_data (time, sensor_id, temperature, humidity)
SELECT generate_series(now() - interval '5 days', now(), interval '10 minutes') AS time,
'sensor_' || floor(random()*3 + 1)::int,
round((20 + random()*15)::numeric,1),
round((40 + random()*40)::numeric,1)
FROM generate_series(1,200);3.7 Spatial GIS – PostGIS
Principle: Implements OGC standards on top of PostgreSQL, supporting points, lines, polygons and R‑Tree (GIST) indexes for spatial calculations.
-- Install PostGIS (requires GDAL, GEOS, PROJ)
CREATE EXTENSION IF NOT EXISTS postgis;
-- Create a table with geometry column
CREATE TABLE locations (
id serial PRIMARY KEY,
name text,
geom geometry(Point, 4326), -- WGS84 lat/lon
geom_webmercator geometry(Point, 3857)
);
-- Insert points of interest
INSERT INTO locations (name, geom) VALUES
('北京天安门', ST_SetSRID(ST_MakePoint(116.397, 39.904), 4326)),
('上海外滩', ST_SetSRID(ST_MakePoint(121.490, 31.235), 4326)),
('广州塔', ST_SetSRID(ST_MakePoint(113.319, 23.110), 4326));
-- Create spatial index
CREATE INDEX idx_locations_geom ON locations USING gist (geom);
-- Distance query: places within 100 km of Tiananmen
SELECT l.name,
ST_Distance(l.geom::geography, b.geom::geography) / 1000 AS distance_km
FROM locations l, locations b
WHERE b.name = '北京天安门'
AND ST_DWithin(l.geom::geography, b.geom::geography, 100000)
ORDER BY distance_km;⚠️ SRID note: 4326 is WGS84 (lat/lon), 3857 is Web Mercator. Convert with ST_Transform before cross‑SRID calculations.
Cross‑Model Query Practice
Scenario: a social‑e‑commerce platform where users, products and stores are linked through social relationships, purchases, vector similarity, full‑text relevance, JSONB attributes and geographic proximity.
Data Model Initialization
-- Enable required extensions (AGE 1.6+ for PG17)
CREATE EXTENSION IF NOT EXISTS age;
CREATE EXTENSION IF NOT EXISTS vector;
CREATE EXTENSION IF NOT EXISTS postgis;
-- Optional Chinese full‑text parser
CREATE EXTENSION IF NOT EXISTS zhparser;
DO $$ BEGIN IF NOT EXISTS (SELECT 1 FROM pg_ts_config WHERE cfgname='chinese') THEN EXECUTE 'CREATE TEXT SEARCH CONFIGURATION chinese (PARSER = zhparser)'; EXECUTE 'ALTER TEXT SEARCH CONFIGURATION chinese ADD MAPPING FOR n,v,a,i,e,l WITH simple'; END IF; END $$;
-- Load AGE and set search_path
LOAD 'age';
SET search_path = ag_catalog, "$user", public;
-- Create graph for social relationships
SELECT create_graph('social_commerce');
-- Insert user vertices
SELECT * FROM cypher('social_commerce', $$
CREATE (alice:Person {id:1, name:'Alice', city:'北京', interest:'数码产品'}),
(bob:Person {id:2, name:'Bob', city:'上海', interest:'摄影'}),
(charlie:Person {id:3, name:'Charlie', city:'北京', interest:'数码产品'}),
(david:Person {id:4, name:'David', city:'深圳', interest:'户外'}),
(emma:Person {id:5, name:'Emma', city:'北京', interest:'美妆'});
$$) AS (result agtype);
-- Insert product vertices (must exist before PURCHASED edges)
SELECT * FROM cypher('social_commerce', $$
CREATE (p1:Product {id:1, name:'iPhone 16 Pro Max', category:'手机', price:11999}),
(p2:Product {id:2, name:'小米15 Ultra', category:'手机', price:7999}),
(p3:Product {id:3, name:'华为Pura 80 Ultra', category:'手机', price:8999}),
(p4:Product {id:4, name:'vivo X200 Ultra', category:'手机', price:6999}),
(p5:Product {id:5, name:'OPPO Find X8 Ultra',category:'手机', price:6499});
$$) AS (result agtype);
-- Social FOLLOW edges
SELECT * FROM cypher('social_commerce', $$
MATCH (a:Person {name:'Alice'}), (b:Person {name:'Bob'})
CREATE (a)-[:FOLLOWS {since:'2026-01-01', trust_level:5}]->(b);
$$) AS (result agtype);
-- (repeat for other FOLLOW relationships as in the source)
-- PURCHASE edges
SELECT * FROM cypher('social_commerce', $$
MATCH (a:Person {name:'Alice'}), (p:Product {name:'iPhone 16 Pro Max'})
CREATE (a)-[:PURCHASED {at:'2026-01-15', rating:5}]->(p);
$$) AS (result agtype);
-- (repeat for other purchases)Product table combines vector, JSONB attributes, generated full‑text vector and stock information:
CREATE TABLE products (
id bigserial PRIMARY KEY,
name text NOT NULL,
category text,
price numeric(10,2),
embedding vector(4),
attributes jsonb DEFAULT '{}',
description text,
search_vector tsvector GENERATED ALWAYS AS (
setweight(to_tsvector('chinese', coalesce(name, '')), 'A') ||
setweight(to_tsvector('chinese', coalesce(description, '')), 'B')
) STORED,
stock integer DEFAULT 0,
sales_count integer DEFAULT 0,
created_at timestamptz DEFAULT now()
);
-- Indexes for vector, JSONB and full‑text
CREATE INDEX ON products USING hnsw (embedding vector_cosine_ops) WITH (m=16, ef_construction=128);
CREATE INDEX ON products USING gin (attributes);
CREATE INDEX ON products USING gin (search_vector);
CREATE INDEX ON products (category);
-- Insert sample products (vector values are illustrative)
INSERT INTO products (name, category, price, embedding, attributes, description, stock, sales_count) VALUES
('iPhone 16 Pro Max', '手机', 11999.00, '[0.12,-0.05,0.08,0.03]',
'{"brand":"Apple","color":"沙漠色钛金属","storage":"256GB","screen":"6.9寸","network":"5G"}'::jsonb,
'Apple iPhone 16 Pro Max,A19 Pro 芯片,6.9寸超视网膜 XDR 显示屏,钛金属边框,拍照能力业界领先,支持卫星求救', 120, 3200),
('小米15 Ultra', '手机', 7999.00, '[0.14,-0.03,0.11,0.05]',
'{"brand":"小米","color":"钛灰色","storage":"512GB","screen":"6.73寸","network":"5G"}'::jsonb,
'小米15 Ultra,骁龙8 Gen4领先版处理器,徕卡光学全明星四摄,2K AMOLED 屏幕,120W 有线+80W 无线快充,影像旗舰', 200, 4800);
-- (additional products omitted for brevity)Store table stores geographic points for GIS queries:
CREATE TABLE stores (
id bigserial PRIMARY KEY,
name text NOT NULL,
brand text,
geom geometry(Point,4326),
city text,
address text,
capacity integer
);
CREATE INDEX ON stores USING gist (geom);
CREATE INDEX ON stores (city);
INSERT INTO stores (name, brand, geom, city, address, capacity) VALUES
('Apple Store 三里屯', 'Apple', ST_SetSRID(ST_MakePoint(116.447,39.923),4326), '北京', '朝阳区三里屯路19号', 500),
('Apple Store 王府井', 'Apple', ST_SetSRID(ST_MakePoint(116.415,39.918),4326), '北京', '东城区王府井大街138号', 400);
-- (more stores omitted)Cross‑Modal Query
WITH
-- CTE 1: Graph friend purchases (1‑2 hops from Alice)
graph_friend_purchases AS (
SELECT (cypher_result.product_id::text)::bigint AS product_id,
cypher_result.product_name AS product_name,
cypher_result.category AS category,
avg((cypher_result.rating)::numeric) AS friend_avg_rating,
count(*) AS friend_purchase_count
FROM cypher('social_commerce', $$
MATCH (alice:Person {name:'Alice'})-[:FOLLOWS*1..2]->(friend:Person)-[r:PURCHASED]->(p:Product)
RETURN p.id AS product_id, p.name AS product_name, p.category AS category, r.rating AS rating
$$) AS cypher_result(product_id agtype, product_name agtype, category agtype, rating agtype)
GROUP BY cypher_result.product_id, cypher_result.product_name, cypher_result.category
),
-- CTE 2: Vector similarity for "拍照好的手机"
vector_similar AS (
SELECT id, name,
1 - (embedding <=> '[0.12,-0.04,0.10,0.05]') AS similarity
FROM products
ORDER BY embedding <=> '[0.12,-0.04,0.10,0.05]'
LIMIT 20
),
-- CTE 3: Full‑text match for keywords "手机" or "拍照"
fulltext_matched AS (
SELECT id, ts_rank(search_vector, query) AS text_rank
FROM products, to_tsquery('chinese', '手机 | 拍照') query
WHERE search_vector @@ query
)
SELECT p.id, p.name, p.category, p.price, p.attributes, p.stock,
COALESCE(vs.similarity,0) AS vector_score,
COALESCE(ft.text_rank,0) AS text_score,
COALESCE(gfp.friend_avg_rating,0) AS friend_score,
-- Nearby stores in Beijing for the product brand
(SELECT jsonb_agg(jsonb_build_object(
'store_name', s.name,
'brand', s.brand,
'distance_km', ROUND((ST_Distance(s.geom::geography,
ST_SetSRID(ST_MakePoint(116.470,39.920),4326)::geography) / 1000)::numeric,1),
'address', s.address))
FROM stores s
WHERE s.city = '北京'
AND s.brand = p.attributes->>'brand'
AND ST_DWithin(s.geom::geography,
ST_SetSRID(ST_MakePoint(116.470,39.920),4326)::geography,
15000)) AS nearby_stores,
-- Composite score (weights sum to 1)
(COALESCE(vs.similarity,0) * 0.35 +
COALESCE(ft.text_rank,0) * 0.20 +
COALESCE(gfp.friend_avg_rating,0) / 5.0 * 0.25 +
CASE WHEN p.stock > 0 THEN 0.10 ELSE 0.0 END +
CASE WHEN p.attributes->>'network' = '5G' THEN 0.10 ELSE 0.0 END) AS total_score
FROM products p
LEFT JOIN vector_similar vs ON p.id = vs.id
LEFT JOIN fulltext_matched ft ON p.id = ft.id
LEFT JOIN graph_friend_purchases gfp ON p.id = gfp.product_id
WHERE p.category = '手机'
AND p.stock > 0
AND p.attributes @> '{"network": "5G"}'
ORDER BY total_score DESC
LIMIT 10;💡 Cross‑modal query optimisation tips: each CTE walks its own index, use ST_DWithin before ST_Distance , normalise scores (e.g., friend rating ÷ 5), and keep graph depth ≤ 3 to avoid exponential blow‑up.
Best‑Practice Extension Selection
AI / Vector Search → pgvector (native integration, HNSW performance)
Graph Traversal → Apache AGE (Cypher, multi‑hop analysis)
Full‑Text Keyword Search → Built‑in tsvector/tsquery (no extra extension, Chinese parser optional)
Semi‑structured Documents → JSONB (high‑performance GIN index)
Simple Key‑Value → hstore (faster writes, smaller indexes)
Time‑Series Monitoring → TimescaleDB (automatic partitioning, continuous aggregates)
Spatial / LBS → PostGIS (OGC‑compliant, mature ecosystem)
Production‑Grade Tips
Graph: create B‑Tree indexes on vertex properties (e.g., MATCH (p:Person {name:'Alice'})), limit multi‑hop depth to 1‑3.
Full‑Text Chinese: install zhparser, keep the dictionary updated, choose tokenisation granularity based on document length.
pgvector: tune ef_construction (128‑256 for production) and m (16‑32 for high‑dimensional vectors).
PostGIS: always set correct SRID, batch insert with DISABLE TRIGGER ALL then rebuild indexes; prefer ST_DWithin over ST_Distance for filtering.
TimescaleDB: set chunk_time_interval to 1 hour or 1 day for fast writes; align continuous‑aggregate refresh frequency with business cycles.
Summary
Apache AGE brings graph capabilities to PostgreSQL, enabling intuitive Cypher queries and multi‑hop relationship analysis.
Full‑Text Search (tsvector/tsquery) offers built‑in inverted indexes and relevance scoring for precise keyword retrieval.
pgvector makes PostgreSQL a first‑class vector engine for AI workloads; HNSW indexes deliver millisecond‑level queries on tens of millions of vectors.
JSONB / hstore handle semi‑structured and flat key‑value data without a separate NoSQL system.
TimescaleDB automates time‑partitioning and continuous aggregates, boosting write and query performance for time‑series by an order of magnitude.
PostGIS provides the most mature open‑source GIS stack, fully compliant with OGC standards.
One database, many data models, a unified SQL interface – that is the core value of PostgreSQL’s multi‑model architecture. Cross‑modal queries combine the strengths of each model, unlocking richer insights for modern applications.
360 Zhihui Cloud Developer
360 Zhihui Cloud is an enterprise open service platform that aims to "aggregate data value and empower an intelligent future," leveraging 360's extensive product and technology resources to deliver platform services to customers.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
