Artificial Intelligence 9 min read

Can China’s GLM‑Zero‑Preview Beat OpenAI’s o3? A Deep Dive into Inference Model Tests

The article evaluates the Chinese GLM‑Zero‑Preview inference model by subjecting it to a wide range of math, logic, language, coding, and multimodal questions, compares its token efficiency and reasoning style to other models, and discusses its current strengths, limitations, and public availability.

Baobao Algorithm Notes

Dec 31, 2024

Can China’s GLM‑Zero‑Preview Beat OpenAI’s o3? A Deep Dive into Inference Model Tests

During a recent discussion about large‑model progress, the author notes that the most talked‑about model abroad is OpenAI’s o3, while Chinese companies are rapidly catching up.

The Chinese vendor has released an inference‑focused model called GLM‑Zero‑Preview (formerly GLM‑Zero‑Prev), trained with reinforcement learning to perform deep reasoning tasks.

To assess its capabilities, the author designed a comprehensive test suite covering arithmetic, geometry, logical reasoning, programming, multimodal image input, and even chemistry calculations.

Sample Questions and Observations

Geometry question: "In a right triangle with legs 5 cm and 12 cm, what is the length of the median to the hypotenuse?" The model answered correctly on a single page, using far fewer tokens than competing systems, suggesting a concise generation strategy that does not rely on traditional MCTS + PRM pipelines.

Language ambiguity: The sentence “他一把把把把住了” contains four instances of the character “把”. The model correctly distinguished the meanings of each, a task that many other models (e.g., o1) fail.

Chemistry calculation: Convert 100 g of copper(II) sulfate pentahydrate into a 500 ml solution and compute the concentration. The author provides the full Python script used for the calculation:

# Define atomic masses
Cu = 63.55  # copper
S = 32.07   # sulfur
O = 16.00   # oxygen
H = 1.01    # hydrogen

# Molar mass of CuSO4·5H2O
molar_mass = Cu + S + 4*O + 5*(2*H + O)

mass = 100  # g
volume_l = 500 / 1000  # L
moles = mass / molar_mass
concentration = moles / volume_l
print(molar_mass, moles, concentration)

Algorithmic challenge: Given N arrays A0…An‑1, select one element from each to minimize the sum of absolute differences between consecutive selections. The author shares a dynamic‑programming solution and warns that greedy approaches can be misleading:

def min_absolute_sum(A):
    n = len(A)
    if n == 0:
        return 0
    dp = [[float('inf')] * len(A[i]) for i in range(n)]
    for j in range(len(A[0])):
        dp[0][j] = A[0][j]
    for i in range(1, n):
        for j in range(len(A[i])):
            for k in range(len(A[i-1])):
                dp[i][j] = min(dp[i][j], dp[i-1][k] + abs(A[i][j] - A[i-1][k]))
    return min(dp[n-1])

# Example
A = [[1,2,3],[4,5,6],[7,8,9]]
print(min_absolute_sum(A))

The model correctly identifies the pitfalls of greedy sorting and pointer‑movement strategies that earlier versions like GPT‑4o sometimes suggest.

Overall Assessment

GLM‑Zero‑Preview demonstrates strong performance on mathematical and logical tasks, showing a “humanistic” reasoning style compared to other inference‑only models that tend to be overly verbose or rigid. However, the author acknowledges a noticeable gap when compared with OpenAI’s o3, especially in complex multimodal reasoning.

Future work will focus on enhancing reinforcement‑learning techniques to broaden deep‑thinking abilities from pure symbolic logic to more general problem solving, moving closer to AGI.

Availability

The model is currently accessible for free via the “Zero inference model” agent on the Zhipu Qingyan platform (chatglm.cn), supporting both text and image inputs with full reasoning traces. Developers can also call the model through the Zhipu Open Platform (bigmodel.cn) via API.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

inference AI benchmarking GLM-Zero

Written by

Baobao Algorithm Notes

Author of the BaiMian large model, offering technology and industry insights.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.