Artificial Intelligence 14 min read

Do New AI Reasoning Models Really Think? Unpacking the Debate

The article examines whether the latest AI models that claim to perform true reasoning—by breaking problems into steps and using chain‑of‑thought—actually reason like humans, presenting skeptical and supportive expert viewpoints, and offering practical guidance on how to use such models responsibly.

Code Mala Tang
Code Mala Tang
Code Mala Tang
Do New AI Reasoning Models Really Think? Unpacking the Debate

The rapid pace of AI development makes it easy to get lost among dazzling new products. OpenAI released a new model, followed by China’s startup DeepSeek, and then OpenAI launched another. While each model matters, focusing on any single one can cause you to miss the larger story of the past six months.

The big story is that AI companies now claim their models can perform genuine reasoning—mirroring how humans solve problems.

The crucial question is whether this claim is true, because the answer will shape how individuals and governments should (and should not) seek AI assistance.

Unlike ChatGPT, which is designed for quick answers, the newest “reasoning models” such as OpenAI’s o1 or DeepSeek’s r1 are built to “think” before answering, breaking large problems into smaller steps—a process known as chain‑of‑thought reasoning.

These models have achieved impressive feats: solving complex logic puzzles, acing math tests, and even writing perfect code on the first try, yet they still fail on very simple questions.

What Is Reasoning?

AI companies define reasoning as decomposing a problem into smaller sub‑problems, solving them step by step, and arriving at a better overall solution. However, this definition is narrower than many expect. Scientists acknowledge multiple types of reasoning—deductive, inductive, analogical, causal, common‑sense, and more.

Humans excel when they can break a complex math problem into incremental steps; similarly, chain‑of‑thought reasoning can be a powerful tool for tackling truly hard problems, though it is not the whole picture of reasoning.

A key aspect of reasoning is the ability to infer rules or patterns from limited data and apply them to novel situations—a capability children demonstrate by abstracting from few examples, as noted by Professor Melanie Mitchell.

Skeptics’ View

Technical philosopher Shannon Vallor describes the new models as “meta‑imitation”: they mimic the human process of arriving at statements rather than genuine reasoning. She points out that models like o1 can produce plausible answers by recalling similar training examples, yet still fail on simple tasks such as the classic “how to get a person and a goat across a river” puzzle.

Mitchell observes that the latest reasoning model o3 shows impressive performance but relies heavily on massive computation whose inner workings remain opaque. She cites research where models generate intermediate “points” that add computational power without guaranteeing true reasoning.

Mitchell likens the models to using heuristics—psychological shortcuts—rather than authentic reasoning, illustrating this with an AI vision system that judged skin‑cancer lesions based on the presence of a ruler in the image rather than actual pathology.

Supporters’ View

Redwood Research’s chief scientist Ryan Greenblatt believes the models are indeed performing a form of reasoning, albeit less flexible than humans and more dependent on memory and knowledge.

He notes that these models can solve problems beyond their training distribution and that their apparent success on classic logic puzzles stems from recognizing the puzzle pattern in the training data.

Open Philanthropy analyst Ajeya Cotra agrees that the models are improving at tasks humans label as “reasoning,” though she cautions against dismissing them as mere meta‑imitation. She uses an analogy of students who combine memorized formulas with selective reasoning to illustrate how AI blends memory and inference.

AI Systems Have “Jagged Intelligence”

Researchers describe a “jagged intelligence” pattern: state‑of‑the‑art models excel at highly complex tasks while struggling with very simple ones, creating peaks and valleys in performance.

Rather than comparing AI directly to human intelligence, it may be more useful to view AI as different—strong in some domains, weak in others.

The practical takeaway is to remember where AI is smart and where it is not, and to use it accordingly. Ideal use cases include tasks where you can verify the AI’s answer, such as code generation or website content creation. In areas lacking objective answers or involving high risk, treat AI as a thinking partner, not a definitive authority.

Large Language ModelsChain-of-ThoughtAI safetyAI reasoningcognitive modeling
Code Mala Tang
Written by

Code Mala Tang

Read source code together, write articles together, and enjoy spicy hot pot together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.