Artificial Intelligence 8 min read

GPT’s Father Sends AI Back to 1930: An AI That Writes Python Without Seeing Code

Alec Radford’s team released Talkie, a 13‑billion‑parameter LLM trained exclusively on pre‑1931 texts (2600 billion tokens), which surprisingly can generate correct Python programs via few‑shot learning, demonstrating genuine reasoning rather than mere memorisation, and the article details its experiments, data‑quality challenges, comparative performance, and ambitious scaling roadmap.

IT Services Circle

May 1, 2026

GPT’s Father Sends AI Back to 1930: An AI That Writes Python Without Seeing Code

Model and Data

Talkie is a 13‑billion‑parameter language model trained on 2.6 trillion tokens that were all published before 31 December 1930. The corpus consists of books, newspapers, scientific journals, patents and legal cases from the public‑domain era; no modern code, internet text, or post‑1930 material appears in the training set.

Programming Capability Test (HumanEval)

The team evaluated Talkie on the HumanEval benchmark by providing a few Python function examples as few‑shot context and asking the model to solve new tasks. Talkie correctly generated simple one‑line programs such as adding two numbers. In a notable case, given an encoding function that shifts letters by +5, Talkie produced the inverse decoding function by changing the shift to ‑5, demonstrating an understanding of the inverse‑function concept.

def encode_shift(s):
    return "".join(chr(((ord(c)-97+5)%26)+97) for c in s.lower() if c.isalpha())

# Talkie's inferred decode function
def decode_shift(s):
    return "".join(chr(((ord(c)-97-5)%26)+97) for c in s.lower() if c.isalpha())

Reasoning vs. Memorisation

To isolate the effect of data recency, the researchers trained a modern twin model ( talkie‑web‑13b‑base) on contemporary web data using the same compute budget. On core language‑understanding and mathematical reasoning tasks, Talkie matched the twin, but it lagged on general‑knowledge evaluations, indicating that temporal scope and data quality influence performance.

Data‑Quality Experiment

Two parallel models were trained on the same set of historical texts: one using a conventional OCR pipeline and the other using manually transcribed texts. The OCR‑based model achieved only 30 % of the learning efficiency of the manually transcribed baseline. Applying simple regex cleaning raised efficiency to roughly 70 %, highlighting a substantial quality gap and motivating the development of a specialised “retro‑OCR” system for pre‑1931 documents.

Post‑Training Pipeline

Supervised Fine‑Tuning (SFT) : Constructed a “retro‑textbook” dataset from etiquette manuals, letter‑writing guides, recipes, encyclopedia entries and poetry, yielding instruction‑response pairs for the first SFT stage.

Direct Preference Optimisation (DPO) : Conducted an online DPO phase where Claude Sonnet 4.6 acted as the preference model, improving Talkie’s instruction‑following behaviour.

Dialogue‑Generation SFT : Used Claude Opus 4.6 to generate multi‑turn dialogues, which were then used for a final round of SFT.

This three‑stage process raised Talkie’s instruction‑following score from 2.0 to 3.4 out of a maximum of 5.

Temporal Awareness Test

The team fed Talkie 5 000 historical “On This Day” entries from the New York Times and measured the model’s surprise (negative log‑likelihood) for each event. Surprise values were low for pre‑1930 events, increased for later years, and peaked in the 1950s‑60s before stabilising, illustrating that the model’s knowledge is anchored to its 1930 cutoff.

Roadmap

Talkie’s current version is 13 B parameters. The team plans to release a GPT‑3‑scale retro model by summer 2026 and later expand the corpus beyond one trillion tokens to train a GPT‑3.5‑level model comparable to early ChatGPT, effectively a ChatGPT frozen in 1930.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Large Language Model Model Scaling few‑shot programming OCR data quality pre‑1930 data talkie

Written by

IT Services Circle

Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.