Industry Insights 30 min read

How AI Rewrote chardet: License Battles, Massive Speedup, and What It Means for Open‑Source

The interview explores how Dan Blanchard used an LLM to completely rewrite the popular Python library chardet, the resulting performance boost, the contentious license change that sparked community backlash, and the broader legal and ethical questions surrounding AI‑generated open‑source code.

Java Tech Enthusiast
Java Tech Enthusiast
Java Tech Enthusiast
How AI Rewrote chardet: License Battles, Massive Speedup, and What It Means for Open‑Source

chardet is a Python library for detecting the character encoding of text files; it averages more than 130 million downloads per month and was originally created by Mark Pilgrim in 2006 before he withdrew from the project.

After years of solo maintenance, Dan Blanchard rebuilt the entire codebase in five days with the help of the Claude large‑language model, publishing the new version under a more permissive license and claiming up to a 48‑fold speed increase and a noticeable accuracy improvement.

The rewrite ignited a heated debate: Pilgrim resurfaced on GitHub and objected to the license change, while some community members argued that a name change was required to avoid confusion, and others defended the rewrite as a necessary evolution.

Dan explains the licensing nuances, contrasting LGPL with MIT, and describes the difficulty of relicensing without signatures from every contributor. He employed a "clean‑room" approach, using JPlag to verify that the new code shares less than 1.3 % similarity with the old version, and considered moving to a public‑domain‑like license such as Zero‑BSD.

Technical details reveal that chardet works by statistical analysis of byte patterns to infer language and encoding, and it is a core dependency of the widely used requests library. The rewrite introduced comprehensive unit‑test coverage (100 % of previously untested code) and substantial performance and accuracy gains.

Beyond the project, the discussion touches on the broader impact of AI‑generated code: questions of authorship, copyright eligibility, and the potential disruption of SaaS business models. Dan cites Bruce Perens’ warning that software licensing may be entering a period of upheaval and notes that AI‑produced code might be deemed uncopyrightable in many jurisdictions.

Dan emphasizes that he seeks no profit, is open to renaming the package to ease migration, and believes the faster, better‑tested library will benefit the Python ecosystem, even as legal uncertainties remain.

PythonAIsoftware engineeringcharacter encodingLicenseopen-sourcechardet
Java Tech Enthusiast
Written by

Java Tech Enthusiast

Sharing computer programming language knowledge, focusing on Java fundamentals, data structures, related tools, Spring Cloud, IntelliJ IDEA... Book giveaways, red‑packet rewards and other perks await!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.