How Linear’s Biggest Outage Happened: A PostgreSQL Truncate Disaster and Recovery Walkthrough
The article details Linear's most severe five‑year outage caused by an accidental TRUNCATE CASCADE on a PostgreSQL table, walks through the minute‑by‑minute timeline, explains why CI checks missed the error, and outlines the recovery steps and post‑mortem lessons for future database safety.
Background
Linear, a fast‑growing project‑management tool used by companies like Vercel and Arc, suffered its largest incident in five years when a production database was unintentionally wiped. The root cause was a TRUNCATE new_table CASCADE command that removed not only the test table but also related production tables via foreign‑key cascades.
Timeline of the Outage
Jan 24 04:47 – Full backup completed (pre‑incident).
Jan 24 07:01 – Problematic change merged into main branch.
Jan 24 07:20 – Change applied.
Jan 24 07:52 – Anomaly detected, self‑investigation started.
Jan 24 08:10 – Severe incident plan activated, more engineers called.
Jan 24 08:36 – Status page and X (Twitter) updated about data access issues.
Jan 24 09:20 – Further status updates.
Jan 24 09:56 – Linear entered maintenance mode and began restoring from backup.
Jan 24 10:48 – Database restored to the 04:47 backup; service became reachable.
Jan 24 11:09 – Status page set to “observing”.
Jan 24 11:30 – Started restoring data between 04:47 and 09:56.
Jan 24 13:50 – Notified users whose workspaces were created between 04:47 and 09:56 that they could not be rebuilt.
Jan 24 15:35 – Sent email to all affected users and workspace admins with incident details and recovery instructions.
Jan 25 14:00 – Admins released a dedicated data‑restore page.
Jan 25 14:25 – Fixed a bug on the restore page that caused API load failures.
Jan 25 16:40 – Started trial data‑recovery run.
Jan 25 17:49 – Official data‑recovery began.
Jan 25 19:48 – Restored 98 % of affected workspaces.
Jan 25 23:20 – Restored 99 % of affected workspaces.
Jan 26 07:37 – All workspaces except one were restored.
Jan 26 08:39 – Completed restoration of the final workspace.
Root Cause Analysis
Linear uses PostgreSQL (confirmed via snaplet.dev). The destructive command was a TRUNCATE t CASCADE which, because of foreign‑key relationships, also cleared data from dependent tables. The SQL script lived in the code repository and was merged after CI and manual review, but the CI pipeline lacked checks for dangerous TRUNCATE operations.
During development of a new feature, engineers created a test copy of a production table. After testing, they attempted to drop the test table with TRUNCATE. Since the test table still referenced production tables via foreign keys, the cascade removed production data as well.
Investigation and Recovery
Multiple cache layers delayed error visibility. Linear’s event‑sourcing architecture recorded every user action, allowing replay from the 04:47 backup. However, some operations conflicted and required manual intervention.
Although PostgreSQL Point‑in‑Time Recovery (PITR) was enabled, the team had never practiced it, so they restored from the full backup instead of rolling forward to the exact failure moment (07:01).
Impact
All users experienced roughly one hour of downtime while the system restored from backup. Within 36 hours, Linear recovered 99 % of the lost data; the remaining 1 % could not be automatically restored due to conflicts.
Post‑mortem Reflections and Improvements
Revoke TRUNCATE privileges from all production users.
Introduce stricter database change controls, including DBA‑level review and separation of code and DB change pipelines.
Enhance pre‑release testing for database migrations.
Build and regularly rehearse PITR recovery procedures.
Improve internal incident‑response processes.
Implement automated data‑integrity checks.
Add a read‑only mode for Linear to keep the UI functional during write‑side outages.
Linear published the full post‑mortem publicly, providing a valuable reference for other engineering teams on handling high‑risk database operations.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
