Can AI Legally Rewrite Open‑Source Code? Inside the chardet License Controversy

The Python encoding detector chardet, originally released under LGPL, was rewritten in five days using Claude AI and re‑licensed to MIT, sparking a heated debate over copyright, clean‑room development, and whether AI‑generated code can bypass original open‑source licenses.

Java Tech Enthusiast
Java Tech Enthusiast
Java Tech Enthusiast
Can AI Legally Rewrite Open‑Source Code? Inside the chardet License Controversy

Background

The chardet library is a widely used Python package for automatic character‑encoding detection of byte streams (e.g., UTF‑8, GBK, ISO‑8859‑1). It was created in 2006 by Mark Pilgrim and originally released under the LGPL. The library is a core dependency of many projects, most notably requests, and receives hundreds of millions of downloads per year.

License Change and AI‑Assisted Rewrite

In early 2026 Dan Blanchard published chardet 7.0 , stating that the release is a “complete rewrite” generated with the help of Claude Code and re‑licensed under the permissive MIT license. The new version targets Python 3.10+, has no runtime dependencies, claims higher speed and accuracy, and is intended to replace the previous 5.x/6.x releases while keeping the public API unchanged.

Original Author’s Protest

Two days after the release, a user identifying as Mark Pilgrim opened a GitHub issue demanding that the license be reverted to LGPL, arguing that the LGPL requires derivative works to retain the same license.

Technical Evidence

Blanchard ran the JPlag code‑similarity tool and reported a maximum similarity of 1.29 % between the files of chardet 7.0 and the previous 6.0 release. By contrast, similarity between versions 5.2 and 6.0 reaches up to 80 % , indicating a substantial rewrite.

AI‑Assisted Development Process

Maintain full API compatibility with previous releases.

Keep the project name chardet so that the new implementation can be a drop‑in replacement.

Do not reuse any GPL/LGPL‑licensed code.

Match the original detection accuracy on the existing test suite.

Optionally add language detection if it can be implemented easily.

Achieve high performance and memory efficiency, leveraging multi‑core CPUs.

Have no runtime dependencies.

Support both PyPy and CPython.

Produce clean, modern Python code.

If statistical models are used, load training data via Hugging Face load_dataset.

Cache training data locally to speed up iterative development.

Run regular performance benchmarks.

Avoid large dictionary literals that slow down import in CPython 3.12.

Blanchard started from an empty repository, instructed Claude not to reference any GPL/LGPL code, and iteratively reviewed, tested, and refined each generated snippet.

Clean‑Room Reimplementation Discussion

A “clean‑room” rewrite traditionally requires two isolated teams: one that studies the original implementation and another that writes new code without seeing the original source. Blanchard acknowledges that he has maintained chardet for over a decade and therefore has extensive knowledge of the original codebase, which raises questions about whether the isolation criteria were truly met.

Because AI models are trained on publicly available code, the FSF argues that a truly clean‑room rewrite may be impossible: the model may have internalized copyrighted patterns, making the generated code a potential derivative work.

Legal Debate

Under the LGPL, modified versions of the library must continue to be distributed under the same license. The original author’s claim is that the MIT‑licensed 7.0 release violates this requirement. Blanchard counters that the similarity analysis shows the new code is independent, and therefore the MIT license is permissible.

The controversy highlights broader questions about how generative AI interacts with copyleft licenses, especially when the AI has been trained on the very code that is being “rewritten.”

References

Repository: https://github.com/chardet/chardet GitHub issue (author protest):

https://github.com/chardet/chardet/issues/327#issuecomment-4005195078

Shift Magazine analysis:

https://shiftmag.dev/license-laundering-and-the-death-of-clean-room-8528/

The Register article:

https://www.theregister.com/2026/03/06/ai_kills_software_licensing/

Ars Technica discussion:

https://arstechnica.com/ai/2026/03/ai-can-rewrite-open-source-code-but-can-it-rewrite-the-license-too/
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonAIOpen SourceLicensechardet
Java Tech Enthusiast
Written by

Java Tech Enthusiast

Sharing computer programming language knowledge, focusing on Java fundamentals, data structures, related tools, Spring Cloud, IntelliJ IDEA... Book giveaways, red‑packet rewards and other perks await!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.