Why Lines of Code and Velocity Are Poor Metrics for Software Development Productivity
The article explains, using a 1982 Lisa team story and agile practice observations, why lines of code, iteration velocity, and full staff utilization are unreliable productivity metrics and advocates using global outcome metrics and leading indicators for effective software development management.
Before starting the discussion, the article begins with a true story from 1982.
In early 1982, the Lisa software team aimed to launch the product within six months. Some managers thought tracking each engineer’s weekly lines of code was a good way to monitor progress, so they created a form to be filled out every Friday with a field for the number of lines written.
Bill Atkinson, the author of Quickdraw and a key Lisa implementer, considered lines of code a foolish productivity metric, believing his goal was to write programs as small and fast as possible, and that measuring lines encouraged sloppy, bloated code.
While optimizing Quickdraw’s region calculation, he rewrote the region engine with a simpler, more generic algorithm, making region operations about six times faster and saving roughly 2,000 lines of code as a side effect.
When he first filled out the management form, he was finishing the optimization and, after a brief thought, entered the number “2000” in the lines‑of‑code field.
The managers’ reaction is unknown, but a few weeks later they stopped asking Bill to fill out the form.
(See https://www.folklore.org/StoryView.py?story=Negative_2000_Lines_Of_Code.txt)
☞ Lines of code and iteration velocity cannot be used as comparative metrics
For decades the industry has used “lines of code”, “development velocity”, and “resource utilization” to measure IT team effectiveness, but these metrics are increasingly seen as inadequate, as illustrated by the lines‑of‑code example.
Since agile became popular, many teams use iteration velocity, yet this metric has two major problems when used for assessment.
On one hand, velocity is a relative metric that depends on a specific team’s context rather than an absolute measure. Because each team’s background differs significantly, their velocities are not convertible; you cannot compare Team A’s velocity X with Team B’s velocity Y on the same scale.
On the other hand, when velocity is used as a productivity measure for evaluation, teams inevitably try to increase it. They may inflate estimated work and focus on completing as many stories as possible, even at the expense of collaboration with other teams, because helping others can lower their own velocity while raising others’, making their team appear worse. This undermines velocity’s intended usefulness and hampers inter‑team collaboration.
☞ Staff planned utilization should not aim for 100%
Many organizations treat staff utilization as a proxy for productivity, scheduling all people’s time into planned tasks.
The problem is that once planned utilization exceeds a certain level, there is no buffer capacity to absorb unplanned work. When plans change or improvement work is needed, managers must coordinate, leading to longer delivery cycles.
Additionally, this approach requires highly accurate time estimates for each planned task. In software development, design and coding are inseparable brain work; each developer produces different code, and design itself contains much ambiguity and uncertainty, conflicting with the need for precise estimates and strict execution times.
☞ Recognize the creative nature of software development while reducing unnecessary variability
Years ago many advocated a “software blue‑collar” strategy, separating analysis and design (brain work) from coding and testing (manual work) like in construction. However, software development remains a craft; design and coding are not separated, and coding is itself a design, creative process.
Nevertheless, not all work in software production is creative; some tasks are repetitive, have relatively fixed cycle times, are predictable, and exhibit low variability. Many such activities—such as compilation, packaging, testing, and deployment—can be standardized and automated.
☞ Use global “outcome” metrics as much as possible
When evaluating a product team’s effectiveness, we should use global outcome metrics whenever possible. Figure 1 lists the observation metric set suggested by Continuous Delivery 2.0. Some metrics influence and constrain each other, and expecting a single role’s effort to dramatically improve these metrics is unrealistic because they are interdependent.
Figure 1: Product R&D effectiveness outcome metrics
☞ Use leading indicators to improve outcome metrics
The above metrics are outcome metrics, which are lagging. To improve them, we must identify corresponding leading indicators and improve those.
Because outcome metrics have many influencing factors, we should avoid one‑cause‑one‑effect thinking. Given the multi‑causal nature, when seeking leading indicators we should look for those with shorter causal paths that are observable and interveneable, thereby accelerating feedback on improvement results.
Figure 2: Extract from Continuous Delivery 2.0, Chapter 4
Continuous Delivery 2.0
Tech and case studies on organizational management, team management, and engineering efficiency
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.