Databases 24 min read

Why Vector Databases Are Needed, PgVector Installation, Usage, and Operational Practices in PostgreSQL

This article explains the necessity of vector databases for AI workloads, reviews the PostgreSQL ecosystem, compares vector database options, provides detailed PgVector installation and usage steps, shares operational best‑practices, performance tuning tips, and real‑world deployment cases at Qunar and Tujia.

Qunar Tech Salon
Qunar Tech Salon
Qunar Tech Salon
Why Vector Databases Are Needed, PgVector Installation, Usage, and Operational Practices in PostgreSQL

With the rapid growth of AI and large language models, massive unstructured data requires efficient storage and retrieval, prompting the emergence of vector databases that store embedding vectors and support similarity search.

The PostgreSQL ecosystem, enriched by extensions such as PgVector, offers a powerful platform for vector data, leveraging PostgreSQL's extensibility and popularity.

Vector databases are classified into dedicated vector stores and traditional DBMSs with vector capabilities; PgVector was selected for Qunar and Tujia due to its open‑source nature, SQL compatibility, and integration with existing PostgreSQL tools.

PgVector Installation and Basic Usage

Before installing, match PgVector versions with PostgreSQL releases (e.g., PgVector 0.7.x supports PG12‑PG17). Installation on Linux/Mac:

cd /tmp
git clone --branch v0.7.2 https://github.com/pgvector/pgvector.git
cd pgvector
make
make install # may need sudo

Create the extension:

postgres=# create extension vector;
CREATE EXTENSION

Define a table with a vector column and add comments:

create table my_img_emb(
  id bigserial primary key,
  img_uuid varchar(64),
  img_name varchar(256),
  img_annotation_name varchar(256),
  img_type varchar(64),
  img_embedding vector(512),
  create_time timestamptz default now() not null,
  update_time timestamptz default now() not null
);
comment on table my_img_emb is '图片embedding表';

Build indexes, e.g., a unique index on img_uuid and an HNSW index on img_embedding :

create unique index concurrently on my_img_emb(img_uuid);
create index concurrently on my_img_emb using hnsw (img_embedding vector_cosine_ops);

Operational Practices

Production deployment began in April 2023; key lessons include choosing vector types (vector, halfvec, bit, sparsevec), preferring HNSW over IVFFlat for higher recall and QPS, and tuning parameters such as hnsw.ef_search and ef_construction to balance index build time, memory, and query performance.

When recall is insufficient, increase hnsw.ef_search (e.g., from 40 to 1000) or adjust index options (larger m and ef_construction ). SQL rewrite to replace the similarity operator with the corresponding function can also improve recall at the cost of speed.

Performance Tuning

Vector workloads generate significant WAL; setting full_page_writes = off and wal_compression = on reduces WAL volume. Monitoring shows WAL drops after these changes, especially during bulk loads and backups.

Real‑World Cases

Qunar and Tujia use PgVector for image‑based house search, AI‑augmented customer service, travel itinerary recommendation, and ticket‑sale pre‑AI assistants, demonstrating the end‑to‑end pipeline from embedding extraction to vector retrieval.

Conclusion and Outlook

The article provides a comprehensive guide for DBAs and developers to adopt PgVector, covering why vector databases are needed, ecosystem fit, installation, best‑practice operations, performance optimization, and successful production deployments, and it anticipates broader adoption across more business lines.

AIRAGVector DatabasePerformance TuningPostgreSQLpgvector
Qunar Tech Salon
Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.