Can AI Legally Rewrite Open‑Source Code? Inside the chardet License Controversy
The Python encoding detector chardet, originally released under LGPL, was rewritten in five days using Claude AI and re‑licensed to MIT, sparking a heated debate over copyright, clean‑room development, and whether AI‑generated code can bypass original open‑source licenses.
Background
The chardet library is a widely used Python package for automatic character‑encoding detection of byte streams (e.g., UTF‑8, GBK, ISO‑8859‑1). It was created in 2006 by Mark Pilgrim and originally released under the LGPL. The library is a core dependency of many projects, most notably requests, and receives hundreds of millions of downloads per year.
License Change and AI‑Assisted Rewrite
In early 2026 Dan Blanchard published chardet 7.0 , stating that the release is a “complete rewrite” generated with the help of Claude Code and re‑licensed under the permissive MIT license. The new version targets Python 3.10+, has no runtime dependencies, claims higher speed and accuracy, and is intended to replace the previous 5.x/6.x releases while keeping the public API unchanged.
Original Author’s Protest
Two days after the release, a user identifying as Mark Pilgrim opened a GitHub issue demanding that the license be reverted to LGPL, arguing that the LGPL requires derivative works to retain the same license.
Technical Evidence
Blanchard ran the JPlag code‑similarity tool and reported a maximum similarity of 1.29 % between the files of chardet 7.0 and the previous 6.0 release. By contrast, similarity between versions 5.2 and 6.0 reaches up to 80 % , indicating a substantial rewrite.
AI‑Assisted Development Process
Maintain full API compatibility with previous releases.
Keep the project name chardet so that the new implementation can be a drop‑in replacement.
Do not reuse any GPL/LGPL‑licensed code.
Match the original detection accuracy on the existing test suite.
Optionally add language detection if it can be implemented easily.
Achieve high performance and memory efficiency, leveraging multi‑core CPUs.
Have no runtime dependencies.
Support both PyPy and CPython.
Produce clean, modern Python code.
If statistical models are used, load training data via Hugging Face load_dataset.
Cache training data locally to speed up iterative development.
Run regular performance benchmarks.
Avoid large dictionary literals that slow down import in CPython 3.12.
Blanchard started from an empty repository, instructed Claude not to reference any GPL/LGPL code, and iteratively reviewed, tested, and refined each generated snippet.
Clean‑Room Reimplementation Discussion
A “clean‑room” rewrite traditionally requires two isolated teams: one that studies the original implementation and another that writes new code without seeing the original source. Blanchard acknowledges that he has maintained chardet for over a decade and therefore has extensive knowledge of the original codebase, which raises questions about whether the isolation criteria were truly met.
Because AI models are trained on publicly available code, the FSF argues that a truly clean‑room rewrite may be impossible: the model may have internalized copyrighted patterns, making the generated code a potential derivative work.
Legal Debate
Under the LGPL, modified versions of the library must continue to be distributed under the same license. The original author’s claim is that the MIT‑licensed 7.0 release violates this requirement. Blanchard counters that the similarity analysis shows the new code is independent, and therefore the MIT license is permissible.
The controversy highlights broader questions about how generative AI interacts with copyleft licenses, especially when the AI has been trained on the very code that is being “rewritten.”
References
Repository: https://github.com/chardet/chardet GitHub issue (author protest):
https://github.com/chardet/chardet/issues/327#issuecomment-4005195078Shift Magazine analysis:
https://shiftmag.dev/license-laundering-and-the-death-of-clean-room-8528/The Register article:
https://www.theregister.com/2026/03/06/ai_kills_software_licensing/Ars Technica discussion:
https://arstechnica.com/ai/2026/03/ai-can-rewrite-open-source-code-but-can-it-rewrite-the-license-too/Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Tech Enthusiast
Sharing computer programming language knowledge, focusing on Java fundamentals, data structures, related tools, Spring Cloud, IntelliJ IDEA... Book giveaways, red‑packet rewards and other perks await!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
