What Caused Linear’s Massive Data Loss and How They Recovered It
Linear, the SaaS project‑management tool, suffered a catastrophic data loss when a TRUNCATE CASCADE command unintentionally wiped production tables, prompting a detailed post‑mortem that outlines the timeline, root cause, recovery steps, impact, and a set of concrete preventive measures.
Linear, a Silicon Valley SaaS for project management, experienced its most severe outage in five years when a TRUNCATE CASCADE operation on a test table also deleted data from linked production tables, exposing gaps in their change‑management and CI processes.
Timeline of the incident
04:47 : Full backup completed (pre‑incident).
07:01 : Change that caused data loss merged into main branch.
07:20 : Change applied.
07:52 : Anomaly detected; investigation started.
08:10 : Severe incident plan activated, more engineers called.
08:36 : Status page and X (Twitter) updated to note data‑access investigation.
09:20 : Further status updates.
09:56 : System entered maintenance mode; restoration from backup began.
10:48 : Database restored to the 04:47 backup; service became reachable again.
11:09 : Status page set to “observing”.
11:30 : Started restoring data between 04:47 and 09:56.
13:50 : Notified users of workspaces created between 04:47 and 09:56 that they could not be rebuilt.
15:35 : Sent email to all affected users and workspace admins with incident details and recovery instructions.
January 25 – Recovery progress
14:00 : Admins released a dedicated data‑restore page.
14:25 : Fixed a bug on the restore page that caused API loading issues.
16:40 : Started a trial data‑restore run.
17:49 : Official data‑restore began.
19:48 : Restored 98 % of affected workspaces.
23:20 : Restored 99 % of affected workspaces.
January 26 – Final restoration
07:37 : All workspaces except one were restored.
08:39 : Completed restoration of the last workspace.
Root cause
Linear uses PostgreSQL (confirmed by the snaplet.dev listing). The destructive command was: TRUNCATE new_table CASCADE; The CASCADE keyword caused all tables with foreign‑key references to new_table to be emptied as well. The operation was performed on a test table that still had foreign‑key links to production tables, so the TRUNCATE inadvertently wiped production data.
Why the safety checks missed it
Linear follows a trunk‑based development model where SQL migration scripts live in the code repository. Each change undergoes CI‑based automated checks and manual review before merging. However, their CI lacked a rule to flag TRUNCATE statements, and the hidden nature of the command meant neither automated checks nor human reviewers caught the issue.
Investigation and recovery strategy
Because Linear caches data at multiple layers, the loss was not immediately visible to end users. The system records every user action as logs stored separately, allowing the team to replay operations from the 04:47 backup. Some operations could not be replayed automatically due to conflicts and required manual intervention.
Linear’s PostgreSQL instance had Point‑in‑Time Recovery (PITR) enabled, but the team had never tested it, so they did not use PITR during the incident. Had they done so, they could have restored to the exact moment of the faulty change (07:01) and reduced the amount of replay needed.
Impact
All users experienced roughly one hour of unavailability while the system restored from backup. Within 36 hours, 99 % of the mistakenly deleted data was recovered; the remaining 1 % required manual resolution. Linear publicly disclosed the full timeline and post‑mortem, providing a valuable reference for other engineering teams.
Post‑mortem reflections and improvement actions
Revoke TRUNCATE privileges from all users on production databases.
Separate database‑change creation from code review, adding dedicated DBA approval for high‑risk operations.
Introduce automated CI checks that detect dangerous SQL commands such as TRUNCATE and DROP.
Enhance pre‑production testing of database migrations, including realistic data sets.
Implement regular drills for PITR‑based recovery and document the process.
Refine internal incident‑response procedures.
Add data‑integrity validation steps after migrations.
Provide a read‑only mode for the application so users can still access the system when the database is write‑blocked.
Linear’s transparent handling and detailed post‑mortem serve as an exemplary “blameless” analysis for other development teams.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
