Artificial Intelligence 15 min read

Can AI Learn to Write Like a Chinese Novelist? Exploring Deep Learning in Literature

This article examines how deep‑learning‑based AI models, from symbolic and statistical NLP methods to Karpathy's recurrent network, progressively learn to generate Chinese wuxia novels, poetry, and web fiction, revealing both their surprising advances and inherent limitations.

21CTO

Jul 5, 2017

Can AI Learn to Write Like a Chinese Novelist? Exploring Deep Learning in Literature

Robot Writing Approach

Language distinguishes humans from machines, and the Turing test uses language to assess intelligence.

NLP

Natural Language Processing (NLP) studies how machines can understand and produce human text. Two main approaches exist: the symbolic method, which builds explicit grammatical and lexical rules, and the statistical method, which lets machines discover patterns by ingesting large text corpora.

Historically, limited computing power favored symbolic methods, but the rise of deep learning in the past decade has enabled statistical methods to dominate.

Karpathy Model

Andrej Karpathy’s 2015 recurrent neural‑network model, released on GitHub, is a compact yet powerful example. With only a few thousand lines of code and no predefined grammar or vocabulary, it learns from raw text input, analyzing character relationships to generate new sequences.

Robot’s Path to Becoming a Novelist

"All‑Mediocre" Learns Jin Yong

The robot first studies Jin Yong’s The Legend of the Condor Heroes (≈800 k characters). After a few seconds of training it produces gibberish, then gradually improves: recognizing frequent characters, adding punctuation, and eventually forming simple sentences with correct word order and basic subject‑verb‑object structure.

Even after millions of repetitions, the output remains limited, resembling the speech of a one‑year‑old child.

"Ji‑Long" Learns Gu Long

Training on Gu Long’s complete works (≈17 M characters) yields more fluent text. The robot captures stylistic differences:

Gu Long’s paragraphs are shorter, often a single sentence.

His style features more psychological description and modern language.

He uses concise, “cool” dialogue.

Although the robot lacks true understanding, it reproduces these subtle stylistic cues.

Robot Poet

When trained on the Complete Tang Poems , the robot generates verses that follow tonal patterns and exhibit reasonable imagery, though rhyme is often missing.

Robot Writes Web Novels

Training on the popular web novel Dou Po Cang Qiong (≈6 M characters) produces relatively coherent prose, because the language is simple and repetitive, which suits statistical learning.

Robot Limitations and Human Trainers

The Karpathy model demonstrates the ceiling of purely statistical learning: it can mimic surface style but cannot convey deep meaning or emotion. Real‑world commercial language models are far more complex.

Effective language acquisition requires interactive feedback. The Microsoft chatbot Tay, trained only on one‑way user input, quickly devolved into profanity, illustrating the need for balanced, bidirectional training.

Large tech companies leverage massive user bases to provide continuous feedback, turning everyday users into inadvertent trainers for ever‑improving AI.

Author: _dailu_ – Original title: When AI Talks About Writing, What Is It Actually Saying?

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

.ai Deep Learning natural language processing Text Generation Language Models

Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.