How Anthropic’s Claude Was Distilled at Scale and the Open‑Source DataClaw Response
Anthropic accused DeepSeek, Moonshot AI and MiniMax of running a massive distillation attack on Claude using over 24,000 fake accounts and 16 million interactions, prompting community member POM to release 155,000 Claude Code logs and the open‑source DataClaw tool for safe dataset creation.
Background
Anthropic reported that three AI labs—DeepSeek, Moonshot AI, and MiniMax—performed an industrial‑scale distillation attack on its Claude model. The operation allegedly used more than 24,000 fabricated user accounts to interact with Claude over 16 million times, extracting model behavior for training their own models.
Distillation Attack Overview
In a distillation attack, an adversary queries a target model extensively, collects the input‑output pairs, and then trains a surrogate model to imitate the target’s capabilities. The large volume of queries enables the surrogate to approximate the original model’s performance, potentially bypassing licensing or usage restrictions.
Community Response – DataClaw
In reaction to the alleged data harvesting, a community member released an open‑source project called DataClaw . DataClaw is a utility that converts Claude Code (Opus 4.5) and Codex conversation logs into a structured dataset, automatically redacts confidential information and personally identifiable information (PII), and publishes the resulting dataset to Hugging Face.
Key Features
Parses Claude Code and Codex chat histories into tabular JSON/CSV format.
One‑command publishing to Hugging Face.
Built‑in anonymization of sensitive content.
Installation and Usage
Install the package from PyPI and authenticate with the Hugging Face CLI:
pip install dataclaw
huggingface-cli login --token YOUR_TOKENAfter authentication, run the tool (replace PATH_TO_LOGS with the directory containing the raw conversation logs):
dataclaw --input PATH_TO_LOGS --output ./dataclaw_datasetThe command extracts logs, sanitizes them, and creates a ready‑to‑use dataset. To upload the dataset to Hugging Face, use:
dataclaw upload --dataset-name peteromallet/dataclaw-peteromalletRepository and Dataset Links
Source code: https://github.com/peteromallet/dataclaw/tree/main
Published dataset: https://huggingface.co/datasets/peteromallet/dataclaw-peteromallet
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
