29 min read

The Future of Software Engineering: Insights from a ThoughtWorks Closed‑Door Retreat

A 2026 ThoughtWorks retreat explored how AI is reshaping software engineering, revealing a shift of rigor from code to specifications, the emergence of a new supervisory "middle loop," evolving roles, security gaps, and the need for new organizational and technical foundations.

Architecture Musings

Apr 13, 2026

The Future of Software Engineering: Insights from a ThoughtWorks Closed‑Door Retreat

Rigor Shifts When AI Writes Code

Engineers agreed that code‑level rigor does not disappear when large language models generate code; it migrates to earlier artefacts and to risk‑based controls. Five concrete shifts were identified:

Specification‑first review : Teams replace traditional code review with pre‑implementation specification reviews. Structured formats such as EARS, state‑machine diagrams, and decision tables are adopted because they give the AI precise constraints. A poorly written spec would otherwise propagate massive defects.

Test suites as first‑class prompts : Test‑Driven Development becomes a form of prompt engineering. Tests are written before any code and act as deterministic validators that prevent an AI from passing a flawed test by simply confirming its own erroneous behaviour. Practitioners reported that when the test suite is correct and the generated code passes, the code is accepted regardless of its style.

Strong type systems and formal constraints : Languages are being leveraged as guardrails. By separating the specification (what must change) from constraints (what must never change), organisations make it impossible for the model to emit code that violates type safety or domain invariants.

Risk‑based grading : Code is classified by business impact—internal tools, external services, safety‑critical systems. The classification determines whether human review is required or whether automated verification (static analysis, runtime checks) is sufficient.

Continuous understanding : Traditional line‑by‑line code review cannot keep up with AI‑generated change velocity. Alternatives such as weekly architecture reviews, ensemble programming (multiple engineers working on the same change simultaneously), and AI‑assisted system‑overview tools are being trialled.

“If AI takes over coding, where does engineering go? No one has the same answer, but everyone agrees the question is urgent.”

The Emerging “Middle Loop”

Software development has long been described with two loops: the personal inner loop (write‑test‑debug) and the outer CI/CD loop (build‑deploy‑operate). Participants coined a third, supervisory loop that sits between them. This “middle loop” focuses on guiding, evaluating, and correcting AI output.

Success in this role requires:

Delegating work to agents rather than implementing it directly.

Strong mental models of system architecture.

Rapid quality assessment without line‑by‑line reading.

Managing multiple parallel AI workstreams while preserving architectural consistency.

“Pair programming solves these problems. If understanding the system matters, keep doing it continuously rather than breaking it into stages.”

Agent Topology and Enterprise Architecture

Extending the Team Topologies framework, the concept of “agent topology” was introduced. Intelligent agents become first‑class participants, so organisational communication structures must accommodate their fluid replication, specialisation, and drift.

Speed mismatch : Agents can clear backlogs in days, exposing bottlenecks in cross‑team dependencies, architecture reviews, and human decision‑making.

Agent drift : An agent trained on e‑commerce back‑end data behaves differently from the same agent operating on ERP data, mirroring human team drift but at accelerated speed.

Decision‑fatigue bottleneck : When agents generate work faster than managers can approve, approval becomes the new limiting factor, raising the question of whether traditional middle management is still viable.

Self‑Healing and Self‑Improving Systems

Two ambition levels were distinguished:

Self‑healing : Return the system to a known good state after a failure.

Self‑improvement : Actively evolve non‑functional qualities such as performance or reliability.

Prerequisites identified include:

A clear change ledger that agents can read.

An agent operating system with identity and permission controls.

Robust rollback and feature‑flag mechanisms that work without code changes.

Fitness functions expressed in terms agents can evaluate (e.g., latency < 100 ms, error rate < 0.1 %).

Current practice still lacks these foundations; most organisations treat code changes as a last‑resort fix rather than a first‑class event.

Human Roles, Skills, and Experience

AI reshuffles, rather than eliminates, human work. Key observations:

Productivity/experience paradox : AI can boost output while degrading developer experience (higher cognitive load, reduced flow). Some participants suggested redefining “developer experience” as “agent experience” because the conditions that help agents (clear specs, deterministic tests) also benefit humans.

Staff engineers : A study of 500 companies showed staff engineers use AI tools less frequently than junior engineers, but when they do, they save up to 20 hours per week. Their friction comes from spending disproportionate time on coordination rather than technical supervision.

Junior engineers : Contrary to the “junior‑obsolete” narrative, AI accelerates their ramp‑up. They have no entrenched resistance to new tools and can become high‑value “productivity options”.

Mid‑level engineers : Many lack deep fundamentals needed for AI‑augmented work. Apprenticeship models, rotation programs, and lifelong‑learning pathways were discussed as mitigations, though no organisation has yet solved the scale problem.

Technical Foundations: Languages, Semantics, and Agent OS

Current programming languages are human‑centric. The group debated what a language designed for agents would look like: strong static typing, limited expressiveness, and built‑in formal constraints that make invalid code impossible to emit.

Semantic layers, knowledge graphs, and domain ontologies—once niche—are resurfacing as essential foundations. One team built a telecom domain ontology with roughly 286 concepts , enabling agents to generate event‑storming artefacts that humans then validate, compressing weeks of discovery into days.

An “agent operating system” sketch includes:

Identity and permission management for each agent.

Context‑window handling to keep prompts within model limits.

A work ledger that records tasks, SLOs, cost constraints, and required skills.

Governance paths linking agent capabilities to compliance requirements.

Security, Governance, and Agile Evolution

Security was identified as dangerously lagging. Granting an agent email access can enable full account takeover; granting full machine access lets an agent perform any operation without oversight.

Recommendations:

Adopt default‑secure platform engineering (secure‑by‑default configurations).

Develop industry‑wide interoperable security standards for agents.

Deploy AI‑augmented defenses that match the speed of AI‑driven attacks.

Agile is not dead but evolving. Some teams compress sprints to one week with AI‑generated demos; others revive XP practices (pair programming, continuous integration) to maintain tight feedback loops. The real threat is governance: fast AI‑enabled teams still hit approval bottlenecks unless governance evolves in parallel.

Increasing batch sizes via AI‑generated large changes risks reversing DORA findings that smaller batches improve stability. Early signs of regression were noted, prompting a call for industry monitoring.

Open Questions

Participants surfaced more questions than answers, including:

How to design career paths for engineers whose work shifts to supervisory loops?

What organisational designs can keep pace with agent speed without collapsing under decision‑fatigue?

Can a world be reached where test suites and formal constraints provide sufficient verification to eliminate human code review?

How to capture and reuse tacit “knowledge” that senior engineers hold (e.g., error‑code symptom mappings) in an “agent subconscious” knowledge graph?

“The retreat didn’t produce a roadmap. It produced a shared realization that the map is being redrawn, and the only people qualified to draw it are those who admit how much they don’t know.”

Source: https://www.thoughtworks.com/content/dam/thoughtworks/documents/report/tw_future%20_of_software_development_retreat_%20key_takeaways.pdf

risk management AI software engineering DevOps team dynamics future of work

Written by

Architecture Musings

When the AI wave arrives, it feels like we've reached the frontier of technology. Here, an architect records observations and reflections on technology, industry, and the future amid the upheaval.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.