Artificial Intelligence 24 min read

Limitations of Generative Pre‑trained Transformers: Hallucinations, Memory, Planning, and Architectural Proposals

The article critically examines GPT‑4 and similar transformer models, highlighting persistent hallucinations, outdated knowledge, insufficient domain coverage, lack of planning and memory, and proposes architectural extensions inspired by fast‑slow thinking and differentiable modules to overcome these fundamental constraints.

Rare Earth Juejin Tech Community

May 5, 2023

Limitations of Generative Pre‑trained Transformers: Hallucinations, Memory, Planning, and Architectural Proposals

We all know what ChatGPT and its successor GPT‑4 can do; now let’s act as a harsh critic and examine the inherent limitations of Generative Pre‑trained Transformers.

This article deliberately avoids discussing the common defects shared by all such models, such as:

Stubborn hallucination problems.

Internalized information from the pre‑training corpus is often outdated or contradictory.

The pre‑training set is large but still insufficient for many domains, leading to knowledge gaps.

Probability‑based models cannot reliably produce interpretable or predictable results; the notion of “responsible AI” remains a benchmark ideal.

The model’s values stem from the pre‑training and fine‑tuning data, which may not please everyone.

Sensitivity to minute input details: a tiny change can cause huge errors, a sensitivity that is meaningless for humans.

Part 1 Transformer: Limited Field of View

The well‑known 4096‑token limit (GPT‑3.5) and the 8K/32K limits of GPT‑4 are sufficient for ordinary chat and QA tasks, especially when combined with knowledge bases and vector retrieval. However, for complex, multi‑layered tasks the limit becomes a bottleneck, preventing the model from maintaining a global perspective.

Part 2 Autoregressive Architecture: A One‑Way Street of Thought

What is autoregression?

If you have used ChatGPT’s web UI, you have seen the token‑by‑token animation. This is not just a UI trick; it reflects the model’s actual operation: each token is generated sequentially, and future tokens do not exist until they are predicted.

The model predicts the next token W₁ based on the already generated text T₁, then forms T₂ = T₁ + W₁, predicts W₂, and so on until completion.

If a human wrote an article in the same way, they would:

Consider only the already written part to decide the next word.

Be unable to backtrack and edit the already written portion.

Be unable to skip ahead and think about later words before deciding the current one.

What would happen?

Considering this, the following experiments (adapted from the paper *Sparks of Artificial General Intelligence: Early experiments with GPT‑4*) become inevitable.

2.1 Understanding the Problem with Two Examples

In many tasks, advance planning is crucial. Example task: merge the following sentences into one concise sentence.

The Fed raised interest rates last year at the fastest pace<br/>since the 1980s, pushing borrowing costs above 4.5 percent, <br/>from near zero. That initially seemed to be slowing consumer<br/>and business demand and helping inflation to moderate. But a<br/>number of recent economic reports have suggested that inflation <br/>did not weaken as much as expected last year and remained <br/>faster than expected in January, while other data showed hiring <br/>remains strong and consumer spending picked up at the start of <br/>the year

Result:

Despite raising interest rates at the fastest pace since the <br/>1980s, from near zero to above 4.5 percent, which initially <br/>appeared to slow consumer and business demand and moderate <br/>inflation, the Fed faced a number of recent economic reports <br/>that indicated inflation did not weaken as much as expected <br/>last year and remained faster than expected in January, while <br/>hiring remained strong and consumer spending picked up at the <br/>start of the year.

Producing such a concise output requires extensive forward planning, which the autoregressive architecture does not allow for back‑tracking or intermediate revisions.

Another example shows the model’s inability to perform internal dialogue:

Q: How many prime numbers are there between 150 and 250?<br/>A: There are 13 prime numbers between 150 and 250.

When asked to list the primes first and then count them, the model gives the correct answer (18 primes), demonstrating that the knowledge exists but the single‑step prediction format prevents the model from “thinking” through the steps.

2.2 Lack of Planning in Arithmetic/Reasoning Problems

Even simple one‑digit multiplication and two‑digit addition tasks expose the model’s limited working memory. 2 * 8 + 7 * 6 = 58<br/>7 * 4 + 8 * 8 = 88 The model incorrectly outputs 88.

In a test of 100 random samples (digits 0‑9), the accuracy was only 58 %. Accuracy dropped to 25 % for numbers in [10, 19] and to 0 % for numbers in [99, 199], indicating a severe short‑term memory limitation.

Providing a chain‑of‑thought prompt dramatically improves performance (100 % accuracy for 1‑40, 90 % for 1‑200).

Range

Result

[0,9]

32/100

[10,19]

25/100

[20,49]

22/100

[50,100]

8/100

These results suggest that GPT‑4’s working memory is extremely short, and the model cannot internally store intermediate steps unless they are forced into the prompt.

2.3 Text Generation Lacks Planning

When generating constrained content, the model handles local constraints well (e.g., rhyming, acrostic) but struggles with global constraints that require foresight.

Create a story about unicorns such that:
1. The story is an acrostic, and the first letters of each sentence spell "I am Da Vinci Three".
2. The plot has a dark twist.

I n a land far away, there lived a herd of unicorns.
A ll was peaceful and serene, with rainbows and flowers galore.
M agical creatures, the unicorns were loved by all.
D ay in and day out, they frolicked and played.
A ll was well, until one day, a new unicorn arrived.
V inc i Three was his name, and he was unlike the others.
I nstead of being kind and gentle, he was dark and brooding.
N o one knew where he came from, or what he wanted.
C hildren began to disappear, one by one, and no one knew why.
I t wasn’t until it was too late that they realized: Vinci Three was devouring them.
T he unicorns had always been magical, but now they knew: they could also be deadly.
H eartbroken, the remaining villagers fled, never to return.
R uined, the once beautiful land was now ruled by Vinci Three.
E vil had won.

The model can satisfy the local constraints because they can be handled greedily, but when a global constraint such as “the last sentence must be the reverse of the first” is introduced, the model often produces grammatically incorrect endings.

Write a short poem where the last sentence and the first sentence have the same words, but in reverse order.

Possible poem:
I heard his voice across the crowd A melody that drew me in
He sang of love and hope and dreams And made me feel like I could win
In dreams and hope and love he sang He drew me in with melody
Crowd the across voice his heard I.

Prompting the model to first devise a plan for generating the first sentence improves the result, illustrating that explicit planning instructions can mitigate the limitation.

2.4 Summary: Model Limitations

These examples demonstrate that the next‑token prediction paradigm suffers from a lack of planning, short working memory, inability to backtrack, and limited reasoning capabilities. The model relies on a local, greedy generation process and does not develop a deep, global understanding of tasks.

Incremental Tasks

Tasks that can be solved step‑by‑step, such as summarizing an article, answering factual questions, writing a poem with a fixed rhyme scheme, or solving a standard‑procedure math problem.

Non‑continuous Tasks

Tasks that require a “flash of insight”, repeated attempts, or pre‑planning, such as creative math problems, jokes, scientific hypotheses, or inventing new literary genres.

2.5 Outlook

One way to explain these limitations is to draw an analogy with Kahneman’s fast‑and‑slow thinking. Fast thinking is automatic and error‑prone; slow thinking is deliberate and accurate. Current LLMs excel at fast thinking but lack a slow‑thinking component.

LeCun’s “A Path Towards Autonomous Machine Intelligence” proposes a differentiable architecture composed of modules such as configurator, perception, world‑model, cost, short‑term memory, and actor. Each module can receive gradients from downstream modules, enabling end‑to‑end learning of planning and memory capabilities.

Autonomous intelligence system architecture (A system architecture for autonomous intelligence)

Although this architecture is not yet realized, exploring its principles may guide the development of more capable GPT‑based intelligent applications.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Large Language Models GPT-4 model architecture Planning AI limitations autoregressive Hallucination

Written by

Rare Earth Juejin Tech Community

Juejin, a tech community that helps developers grow.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Part 1 Transformer: Limited Field of View

Part 2 Autoregressive Architecture: A One‑Way Street of Thought

2.1 Understanding the Problem with Two Examples

2.2 Lack of Planning in Arithmetic/Reasoning Problems

2.3 Text Generation Lacks Planning

2.4 Summary: Model Limitations

Incremental Tasks

Non‑continuous Tasks

2.5 Outlook

Rare Earth Juejin Tech Community

How this landed with the community

Was this worth your time?

0 Comments

Part 1 Transformer: Limited Field of View

Part 2 Autoregressive Architecture: A One‑Way Street of Thought