How a Single Deployment Mistake Cost Knight Capital $460 Million
A disastrous software deployment at Knight Capital left an outdated code path active on one server, causing millions of erroneous orders that wiped out $460 million in seconds and highlighted critical failures in testing, monitoring, and incident response processes.
It describes how a software bug caused Knight Capital to lose $460 million, leading to the company’s bankruptcy.
The incident involved a large, unmaintained codebase with significant technical debt, illustrating a severe DevOps failure.
To enable clients to participate in the NYSE Retail Liquidity Program (RLP), Knight made several changes to its order‑handling system, scheduling five deployments starting on August 1 2012, including new code on the SMARS router.
SMARS is an automated high‑speed algorithm router that receives parent orders, splits them into child orders based on available liquidity, and sends them to the market.
During deployment, the new RLP code was intended to replace unused “Power Peg” code, a feature that had not been used for years.
Although dormant, the Power Peg code remained callable, and the new RLP code reused a flag originally used to activate Power Peg. Knight expected the old code to be removed so that when the flag was true, only the new RLP code would run.
Power Peg contained a cumulative‑quantity function that stopped sending child orders after the parent order was fully executed.
Knight stopped using Power Peg in 2003 and moved its tracking logic within SMARS in 2005, but never retested the legacy code.
From July 27 2012, Knight rolled out the new RLP code to eight SMARS servers over several days, but one server never received the update.
No second technician reviewed the deployment, and there was no written process requiring such a review, so the missing server still ran the old Power Peg code.
On August 1, Knight received RLP‑authorized client orders. Seven servers processed them correctly; the eighth server, using the reused flag, triggered the defective Power Peg code, sending unintended child orders to a specific exchange.
Later that morning, six servers handled pre‑market orders, and an internal system generated email alerts about a “Power Peg disabled” error. The system sent 97 such emails before the market opened, but they were not treated as alarms and staff largely ignored them.
Knight lacked an incident‑response monitoring process, relying on the technical team to discover problems while the system continued to emit millions of child orders.
In an attempt to fix the issue, Knight removed the new RLP code from the seven correct servers, worsening the situation as remaining parent orders activated residual Power Peg code on the problematic server.
The document recommends new human‑driven processes to prevent similar disasters, emphasizing the need for proper deployment scripts, testing, and product monitoring.
The company ultimately faced a $12 million fine, and an audit revealed it had also sent unsecured short‑sell orders.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
