Why Embracing Failure Accelerates Growth: Lessons from Intuit and PayPal
The article explains how organizations can achieve rapid growth by openly acknowledging failures, creating lightweight post‑mortem processes, and continuously learning from mistakes, illustrated through Intuit’s SaaS transition, PayPal’s rollback challenges, and practical rules for QA and architecture.
Rule 27 – Failure Is the Mother of Success
Long‑term research shows that more is learned from failure than from success, but only when an open, honest communication environment and lightweight processes are in place to turn failures into lessons.
Intuit’s Growth Story
Intuit, a $260 billion‑valued company with over 8 500 employees, moved from desktop software to SaaS and mobile. A 2010 data‑center power‑unit failure caused a 24‑hour outage, highlighting gaps in knowledge about SaaS infrastructure.
The incident, occurring outside the peak tax‑season, prompted deep post‑mortem discussions and a shift toward proactive/active high‑availability architecture, but the team lacked sufficient SaaS experience.
To address skill gaps, Intuit retrained desktop engineers into mobile developers and hired new senior architects, rapidly improving their ability to handle SaaS and mobile challenges.
Postmortem Process
Intuit adopted a three‑stage postmortem process:
Stage 1 – Timeline: Record every event leading to the incident without analysis.
Stage 2 – Discover Issues: Review the timeline, ask “why” repeatedly (at least five times) to uncover root causes across people, process, and architecture.
Stage 3 – Define Actions: Assign concrete, measurable, time‑bound actions (using SMART criteria) to each identified issue.
The process stresses that true root causes are rarely single; multiple factors must be addressed before the analysis is complete.
Rule 28 – Don’t Rely Solely on QA
QA can lower delivery cost and increase throughput, but it does not improve quality by itself. Effective QA should surface recurring defects, guide engineers toward better practices, and support rapid growth, especially in high‑velocity organizations.
Techniques such as code reviews, test‑driven development (TDD), and automated testing help catch defects early, reducing reliance on QA as a safety net.
Rule 29 – Never Roll Out Un‑rollbackable Code
Every release must be rollback‑capable. PayPal’s 2004 outage demonstrated the danger of assuming code cannot be reverted; the lack of rollback capability prolonged a three‑day service disruption.
Key practices for rollback readiness include:
Only additive database schema changes; never delete columns or tables without a subsequent release that removes the dependency.
Script all database changes with accompanying rollback scripts and test them in QA or pre‑production environments.
Avoid ambiguous SQL (e.g., SELECT *); explicitly list columns in UPDATE statements.
Use feature flags or beta configurations to enable/disable functionality without full rollbacks.
Key Takeaways
Organizations that continuously learn from failures—whether from customers, internal incidents, or industry case studies—gain a competitive edge. Building a culture of transparent postmortems, empowering QA as a learning tool, and ensuring every change is reversible are essential for high‑availability, scalable systems.
Editor’s note: The excerpt is taken from Architecture Classics: Design Principles for Internet Technology Architecture (2nd edition) by Martin Abbott and Michael Fisher, translated by Chen Bin.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
