How Anthropic’s Claude Was Distilled at Scale and the Open‑Source DataClaw Response

Anthropic accused DeepSeek, Moonshot AI and MiniMax of running a massive distillation attack on Claude using over 24,000 fake accounts and 16 million interactions, prompting community member POM to release 155,000 Claude Code logs and the open‑source DataClaw tool for safe dataset creation.

PaperAgent
PaperAgent
PaperAgent
How Anthropic’s Claude Was Distilled at Scale and the Open‑Source DataClaw Response

Background

Anthropic reported that three AI labs—DeepSeek, Moonshot AI, and MiniMax—performed an industrial‑scale distillation attack on its Claude model. The operation allegedly used more than 24,000 fabricated user accounts to interact with Claude over 16 million times, extracting model behavior for training their own models.

Distillation Attack Overview

In a distillation attack, an adversary queries a target model extensively, collects the input‑output pairs, and then trains a surrogate model to imitate the target’s capabilities. The large volume of queries enables the surrogate to approximate the original model’s performance, potentially bypassing licensing or usage restrictions.

Community Response – DataClaw

In reaction to the alleged data harvesting, a community member released an open‑source project called DataClaw . DataClaw is a utility that converts Claude Code (Opus 4.5) and Codex conversation logs into a structured dataset, automatically redacts confidential information and personally identifiable information (PII), and publishes the resulting dataset to Hugging Face.

Key Features

Parses Claude Code and Codex chat histories into tabular JSON/CSV format.

One‑command publishing to Hugging Face.

Built‑in anonymization of sensitive content.

Installation and Usage

Install the package from PyPI and authenticate with the Hugging Face CLI:

pip install dataclaw
huggingface-cli login --token YOUR_TOKEN

After authentication, run the tool (replace PATH_TO_LOGS with the directory containing the raw conversation logs):

dataclaw --input PATH_TO_LOGS --output ./dataclaw_dataset

The command extracts logs, sanitizes them, and creates a ready‑to‑use dataset. To upload the dataset to Hugging Face, use:

dataclaw upload --dataset-name peteromallet/dataclaw-peteromallet

Repository and Dataset Links

Source code: https://github.com/peteromallet/dataclaw/tree/main

Published dataset: https://huggingface.co/datasets/peteromallet/dataclaw-peteromallet

large language modelsopen-sourceAI securityClaudeindustry insightsDataClawDistillation Attack
PaperAgent
Written by

PaperAgent

Daily updates, analyzing cutting-edge AI research papers

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.