Industry Insights 14 min read

Can AI Legally Re‑License Open‑Source Code? The chardet Rewrite Controversy

The recent AI‑driven rewrite of the Python encoding detector chardet sparked a heated debate over licensing, clean‑room development, and whether a completely new implementation can legitimately switch from LGPL to MIT, highlighting the broader challenges of AI‑generated open‑source software.

dbaplus Community
dbaplus Community
dbaplus Community
Can AI Legally Re‑License Open‑Source Code? The chardet Rewrite Controversy

Background

chardet is a widely used Python library for detecting text encoding (UTF‑8, GBK, ISO‑8859‑1, etc.). It was created in 2006 by Mark Pilgrim and released under the LGPL license. Over the years it became a core dependency for many projects, including the popular requests library, with annual downloads exceeding 850 million.

Original Maintainer Takes Over

After Pilgrim stepped away in 2011, Dan Blanchard became the primary maintainer in July 2012, contributing nearly 700 commits. The project continued under his stewardship while the original author remained silent.

AI‑Assisted Complete Rewrite

In early 2026 Blanchard announced chardet 7.0, claiming it was a "complete rewrite" generated with Claude Code and released under the permissive MIT license. The new version retains the same public API, runs faster, improves accuracy, supports Python 3.10+, has no runtime dependencies, and works on PyPy.

Original Author’s Protest

Two days after the release, a GitHub user identifying as Mark Pilgrim posted that the MIT‑licensed 7.0 version constituted an illegal relicensing of LGPL‑covered code. He demanded the project revert to its original license and quoted the LGPL requirement that derivative works must retain the same license.

His full comment included:

"I am Mark Pilgrim, original author of chardet. The maintainer claims they have the right to re‑license, but they do not. According to LGPL, any modified and redistributed code must continue to use LGPL. The rewrite is not a clean‑room implementation because the maintainer had extensive exposure to the original code."

Clean‑Room Debate

The term "clean‑room" refers to a development process where one team writes code without ever seeing the original source, ensuring the new code is not a derivative work. Blanchard admitted he had maintained chardet for over a decade, meaning he did not meet strict clean‑room isolation.

To support his claim of independence, Blanchard ran similarity analysis with JPlag, finding a maximum similarity of 1.29 % between files in version 7.0 and version 6.0, compared with up to 80 % similarity between versions 5.2 and 6.0.

AI’s Role in the Rewrite

Blanchard described his workflow with Claude:

Generated a design document using Claude’s “super‑powers brainstorming”.

Ensured API compatibility with the original library.

Named the project still chardet to replace the old implementation.

Avoided any GPL/LGPL code.

Targeted comparable detection accuracy on test data.

Optional language detection if easy to add.

Prioritized high performance, low memory usage, and multi‑core CPU utilization.

Removed all runtime dependencies.

Supported both PyPy and CPython.

Kept the codebase clean and modern.

Used Hugging Face load_dataset for any statistical models.

Cached training data locally for rapid iteration.

Performed regular performance benchmarks.

Avoided large literal dictionaries that slow imports in CPython 3.12.

He started from an empty repository, instructed Claude not to rely on any GPL/LGPL code, and iteratively reviewed, tested, and refined each generated component. While he did not hand‑type every line, he was deeply involved in architecture design, code review, and iteration.

Controversy Over AI‑Generated Code

Critics point out that Claude was trained on publicly available code, possibly including earlier chardet versions, raising questions about whether the output is truly independent. Some community members argue that feeding copyleft code to a large model and then releasing the generated result under a permissive license could effectively relicense any copyleft project, a practice they deem risky.

Others, like Armin Ronacher, liken the situation to the philosophical “Ship of Theseus”: if you replace every part of a work, is it still the same? The Free Software Foundation’s Zoë Kooyman warned that AI models absorb the original code, making a “clean” rewrite impossible.

Open Questions

Who decides the licensing of a project when the original author disappears and a single maintainer rewrites the code with AI assistance? Is the MIT‑licensed chardet 7.0 a genuinely new work, or does it violate the LGPL? The debate highlights a gap in legal frameworks for AI‑generated open‑source software.

AI code generationopen-sourceClaudeSoftware licensingMIT licensechardetclean room
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.