Fundamentals 9 min read

The Norris Constant and the 20,000 Line Bottleneck

The article explains the Norris constant—initially 1,500 lines, later 2,000 and 20,000—as a practical bottleneck where novice programmers’ code becomes unmanageable, arguing that disciplined design, simplicity, and strategic “no” decisions are essential to break these limits and sustain large‑scale software.

Baidu Tech Salon

Oct 9, 2014

The Norris Constant and the 20,000 Line Bottleneck

In 2011, John D. Cook wrote a blog post mentioning:

My friend Clift Norris discovered a basic constant, which I call the Norris constant, the average amount of code a novice programmer can write before hitting a bottleneck. Clift estimated this value to be 1500 lines. Beyond this number, the code becomes so chaotic that even the programmer cannot debug or modify it easily.

I also realized that the next bottleneck for programmers would occur at 20,000 lines. I adjusted the Norris constant to 2,000, making it ten times larger.

After leaving university, my first job faced the 20,000 line bottleneck with colleagues of similar age. At DreamWorks, we had 950 programs for animators, with line counts ranging from 20,000 to 25,000. Beyond this, no amount of effort could add new features.

In 1996, I led the development of DreamWorks' lighting tools with two other programmers, knowing it would exceed 20,000 lines. I changed my programming approach, and the tool was delivered in about 200,000 lines of code a year later. (This tool retired in 2013 after 16 years of daily use for 21 films.) I encountered the next bottleneck due to writing programs with 100,000 to 200,000 lines, feeling its approach.

The hardest part is discussing technical solutions with people who haven't broken these bottlenecks. Breaking them requires different trade-offs, especially decisions that seem unreasonable short-term but helpful long-term. It's hard to argue when short-term benefits are obvious, but I can't convince anyone that a seemingly harmless change now could break existing code later.

Edsger Dijkstra wrote in 1969:

A one-year-old child crawls at a certain speed, say one mile per hour. But one thousand miles per hour is a supersonic jet. In terms of moving objects, these two are incomparable; one can reach where the other cannot, and vice versa.

A novice programmer, like Clift's, learns to crawl, then toddle, walk, run, and sprint. He thinks, "With this acceleration, I can catch up to a supersonic jet!" But he hits the 2,000 line limit because his skills don't scale proportionally. He must change his approach, like driving to go faster. Then he learns to drive, starts slow, speeds up, but hits the 20,000 line limit. Driving skills don't become flying skills.

My friend Brad Grantham explained this with "brute force" solutions. I believe this is correct: when code is under 2,000 lines, you can write any messy code and rely on your memory to save you. Well-thought-out classes and package decomposition will scale to 20,000 lines.

The key to breaking this bottleneck is keeping things simple. Unless absolutely necessary now, refuse to add any new features or code. I've emphasized this in "Every Line Is a Potential Bug" (before I understood "Simple is Good"). DreamWorks' chief VFX architect understood this:

For me, the success of the lighting tool lies in choosing a series of easy-to-use and maintainable small functions that are powerful enough to be a great lighting tool.

As a technical leader, I know my main contribution is saying "no" to colleagues on what they think is important but can't prove is reasonable. The real trick is knowing what increases linear complexity (only related to itself) and exponential complexity (related to other requirements). Both should be avoided, but the latter needs more convincing reasons.

For example, in 2012, the Linux kernel had 15 million lines of code. 75% was linear complexity (drivers, file systems, processor-specific code). You might have many video drivers, but they don't interact much. The rest has more dependencies.

Dijkstra found it hard to teach these advanced methods because they only make sense for programs of 20,000 or 200,000 lines. Any class or specification must limit examples to a few hundred lines, and brute force works here too. You really need a sample to show that 30,000 lines of code isn't too complex, so new features can be added easily. But this is impossible.

I don't know what changes to make to break the 200,000 line bottleneck. I recently switched to a more pure functional style and reduced mutable state, which might help.

And I wonder what code at 2 million lines will look like.

Between three and four million lines of code, there seems to be an invisible wall, regardless of how many people (hundreds) or years (decades) are spent growing it, the growth rate will significantly decrease. - Dan Wexler

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

programming Software engineering software development code complexity programming bottlenecks

Written by

Baidu Tech Salon

Baidu Tech Salon, organized by Baidu's Technology Management Department, is a monthly offline event that shares cutting‑edge tech trends from Baidu and the industry, providing a free platform for mid‑to‑senior engineers to exchange ideas.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.