Artificial Intelligence 8 min read

Why OpenAI’s New o1 Model Outperforms Its Rivals

The article examines OpenAI’s newly released o1 model, highlighting its superior performance in complex reasoning tasks such as math, programming, and science, and explains how model‑level chain‑of‑thought optimization and product‑level UI design give it an edge over competitors like Claude.

CSS Magic

Sep 14, 2024

Why OpenAI’s New o1 Model Outperforms Its Rivals

Recently, rumors that “OpenAI is losing its edge” have circulated, but OpenAI responded by launching the o1 large model, instantly restoring its leadership position.

Model‑Level Innovations

The o1 model, dubbed the “Strawberry” model by Sam Altman, focuses on complex general‑reasoning scenarios and surpasses previous models such as GPT‑4o in mathematics, programming, and scientific domains. According to publicly available information, o1 improves its chain‑of‑thought (CoT) capability through reinforcement learning, enabling more effective reasoning on intricate problems. Benchmark results cited in the article demonstrate the model’s superior scores.

The key term is “chain‑of‑thought”. CoT is a prompting technique that asks the model to “think step by step” with a few examples of reasoning, significantly boosting reasoning performance.

Because o1 strengthens its internal CoT ability, it differs from other large models: before producing a final answer, it runs an internal reasoning phase, as illustrated below.

Large models generate tokens sequentially without a genuine thinking process, often yielding unreliable answers to complex questions. Prompt engineers introduced CoT to make models decompose problems, producing more accurate responses.

o1 takes this further: users simply ask a question, and the model internally performs the reasoning, delivering a high‑quality answer without the user needing to craft a CoT prompt.

Product‑Level Breakthroughs

ChatGPT Interface

ChatGPT Plus and Team subscribers can now try o1 via the “o1‑preview” and “o1‑mini” options in the model selector.

When asked the classic “How many ‘r’ letters are in the word ‘Strawberry’?”, o1 does not immediately output a result. Instead, it enters a multi‑second internal reasoning phase, shown in the following screenshots.

After a few seconds, o1 returns the correct answer, and users can expand the displayed reasoning to see the step‑by‑step thought process, which is fully in English.

Because this example is simple, the displayed reasoning appears verbose; for truly multi‑step problems, the reasoning would be richer and may include self‑correction.

Introducing the reasoning phase lengthens the waiting time, but the UI updates dynamically, allowing users to perceive progress—a design praised as an interaction‑design exemplar.

API Differences

The o1 API is publicly available, but it provides a “special‑edition” version of the model: the internal reasoning is omitted from the response, and streaming output is not supported.

The omission is a product decision: the reasoning is intended for the model’s own use, and excluding it simplifies multi‑turn conversations because the reasoning does not need to be fed back into the dialogue context.

The multi‑turn conversation flow is illustrated below; the Reasoning step is not included in the next turn’s input.

The current lack of streaming is likely a temporary limitation, and the article expects future releases to enable it.

Summary

This piece covered the technical and product innovations that make o1 a strong contender in complex reasoning tasks, and previewed upcoming discussions on its current limitations and impact on developers.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

ChatGPT OpenAI reasoning AI evaluation Chain-of-Thought o1 model

Written by

CSS Magic

Learn and create, pioneering the AI era.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.