Lessons Learned from Hosting PostgreSQL at Scale and Using It as a Job Queue
This article shares a detailed case study of how a startup managed a massive cinema‑ticket dataset with PostgreSQL, evaluated cloud and self‑hosted options, refined materialized‑view strategies, built a reliable job queue, and distilled practical takeaways for large‑scale database operations.
Background
The author, co‑founder of Applaudience, needed to store and process a cinema‑ticket dataset that records over a billion rows per month across 3,200+ theatres in Europe and the US. The PostgreSQL database became the core of the business, serving both as the source of truth and as a job queue.
Common Mistakes
Many teams over‑engineer data pipelines with unnecessary message queues, caching layers, and complex ETL processes. The author argues that a well‑designed single database, with atomic transactions and minimal external services, avoids most of these pitfalls.
Goals
The aim was to keep the architecture simple: no separate message‑queue service, no cache, no dedicated data‑warehouse replica. Instead, the team focused on low latency, appropriate hardware, and careful schema design to eliminate bottlenecks.
Dataset Overview
The dataset combines showtimes, ticket prices, and seat‑level data with auxiliary information from YouTube, Twitter, and weather feeds, resulting in a comprehensive time‑series view of movie releases for forecasting purposes.
Where to Host the Database
Google Cloud SQL for PostgreSQL
Amazon RDS for PostgreSQL
Aiven.io (managed PostgreSQL)
Self‑hosted on rented hardware
Google Cloud SQL
Received $100k in credits, but the service lagged behind the latest PostgreSQL version and contained a known bug that caused data corruption. The bug was fixed only in newer releases, making Google Cloud SQL a risky choice.
Amazon RDS
Provided up‑to‑date PostgreSQL versions, but lacked support for the TimescaleDB extension needed for partitioning, and Amazon’s own time‑series service (Timestream) did not meet the project’s requirements.
Aiven.io
Managed PostgreSQL with all required extensions, including TimescaleDB, and offered good support, but did not grant super‑user access, leading to operational issues such as missing auto_explain and logical replication problems.
Update 2019‑02‑05: Aiven added auto_explain support in version 1.2.
Self‑Hosted
By renting dedicated hardware, the team avoided vendor lock‑in, achieved lower costs (about 30 % cheaper than cloud), and could hire a freelance DBA for daily health checks.
Takeaways
Cloud providers prioritize proprietary solutions (BigQuery, Redshift). For a simple database that will not grow to billions of rows, a managed instance with near‑real‑time scaling, cross‑region migration, point‑in‑time recovery, and built‑in monitoring is often sufficient.
Performance Bonus
Benchmarking showed GCP’s PostgreSQL outperformed AWS by roughly 2× in the author’s tests.
Materialized Views
Initially, the team used a single large materialized view (MV) for venue data. As the dataset grew, refresh times ballooned from seconds to hours, prompting a series of refinements.
First Attempt
Created a single MV venue_view that duplicated the base table and added indexes.
CREATE MATERIALIZED VIEW venue_view AS
WITH
auditorium_with_future_events AS (
SELECT e1.venue_id, e1.auditorium_id
FROM event e1
WHERE e1.start_time > now() - INTERVAL '30 day'
AND e1.auditorium_id IS NOT NULL
GROUP BY e1.venue_id, e1.auditorium_id
),
...
WITH NO DATA;Second Attempt – Divide and Conquer
Split the large MV into several smaller MVs (e.g., auditorium_with_future_events_view , venue_auditorium_seat_count_view ) so each could be refreshed independently.
CREATE MATERIALIZED VIEW auditorium_with_future_events_view AS
SELECT venue_id, auditorium_id
FROM event
WHERE start_time > now() - INTERVAL '30 day'
AND auditorium_id IS NOT NULL
GROUP BY venue_id, auditorium_id
WITH NO DATA;This approach reduced transaction size and allowed index‑driven joins, but increased the number of MVs to manage.
Third Attempt – Abstract Sub‑sets
Created domain‑specific MVs (e.g., last_week_event , future_event ) to avoid scanning billions of rows in a single view.
Fourth Attempt – Materialized Columns
Added a materialized_at timestamp column directly to the base tables, eliminating the need for separate MVs for many use‑cases. A background job populates these columns when they are NULL.
CREATE OR REPLACE FUNCTION materialize_event_seat_state_change()
RETURNS void AS $$
BEGIN
WITH event_seat_state_count AS (
SELECT essc1.id,
count(*)::smallint AS seat_count,
...
FROM event e1
LEFT JOIN event_seat_state_change essc1 ON essc1.event_id = e1.id
...
GROUP BY essc1.id
)
UPDATE event_seat_state_change essc1
SET materialized_at = now(),
seat_count = essc2.seat_count,
...
FROM event_seat_state_count essc2
WHERE essc1.id = essc2.id;
END
$$ LANGUAGE plpgsql;Using the Database as a Job Queue
By leveraging UPDATE … SKIP LOCKED , the team built a simple, reliable, and high‑throughput concurrent work queue without external services.
CREATE OR REPLACE FUNCTION schedule_cinema_data_task()
RETURNS table(cinema_data_task_id int) AS $$
DECLARE scheduled_id int;
BEGIN
UPDATE cinema_data_task_queue
SET attempted_at = now()
WHERE id = (
SELECT id FROM cinema_data_task_queue
WHERE attempted_at IS NULL
ORDER BY id ASC
LIMIT 1
FOR UPDATE SKIP LOCKED
)
RETURNING cinema_data_task_id INTO scheduled_id;
UPDATE cinema_data_task SET last_attempted_at = now() WHERE id = scheduled_id;
RETURN QUERY SELECT scheduled_id;
END
$$ LANGUAGE plpgsql;This pattern kept queue scans fast, allowed scaling to over 2,000 concurrent aggregation agents, and ensured that the scheduling query executed in a few milliseconds.
Key Lessons
PostgreSQL materialized views are excellent for small to medium datasets but require careful planning as data grows.
Fine‑grained MV refreshes and materialized columns provide near‑real‑time data enrichment without heavy data‑warehouse pipelines.
Using the database itself as a job queue works well when the queue table stays reasonably sized and scheduling queries stay sub‑millisecond.
Latency between application and database matters; colocating them can multiply throughput.
Proper index ordering, vacuum settings, and monitoring (pg_stat_statements, pg_stat_user_tables, pg_stat_activity) are essential for large‑scale PostgreSQL deployments.
Miscellaneous Tips
Adjust from_collapse_limit and join_collapse_limit from the default of 8 to higher values to avoid sub‑optimal plans.
Consider pg_repack or pg_squeeze for table bloat.
Continuously monitor query performance and index usage.
Bonus: Slonik PostgreSQL Client
The team switched from node‑postgres to Slonik , a type‑safe, logged, and assertion‑rich client that prevents SQL injection and integrates with auto_explain.
For more details, see the original article at http://jiagoushi.pro/node/1177 .
Architects Research Society
A daily treasure trove for architects, expanding your view and depth. We share enterprise, business, application, data, technology, and security architecture, discuss frameworks, planning, governance, standards, and implementation, and explore emerging styles such as microservices, event‑driven, micro‑frontend, big data, data warehousing, IoT, and AI architecture.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.