Grok 4’s HLE Breakthrough & Why Group Message Read Receipts Are So Hard

This article examines Grok 4’s impressive 45% HLE score—nearly double Gemini 2.5 Pro—explaining its significance in AI evaluation, and then delves into the technical challenges of implementing reliable read‑receipt mechanisms for group messages, covering data storage and acknowledgment strategies.

Tencent Cloud Developer
Tencent Cloud Developer
Tencent Cloud Developer
Grok 4’s HLE Breakthrough & Why Group Message Read Receipts Are So Hard

01 Grok 4 Latest Technical Evaluation and Release Guide

Introduction: Elon Musk skipped Grok 3.5 and launched Grok 4, aiming to go live after July 4 with a focus on programming model optimization, prompting the question of whether this extreme iteration can turn the AI arms race in his favor.

Grok 4 achieved an astonishing 45% score on the HLE benchmark, almost twice the performance of Gemini 2.5 Pro. Since HLE is a free‑response test where random guessing yields only about 5% accuracy, each percentage point improvement is extremely difficult.

This result means that Grok 4 can handle many obscure information‑retrieval tasks that define the “Human Last Exam,” effectively outperforming all current AI models on this challenging benchmark.

02 Why Group Message Read Receipts Are So Hard

Introduction: In personal social apps like WeChat, users want to know if their messages are read, but the platform lacks built‑in read‑receipt features. Business tools like DingTalk enforce read receipts, eliminating the ability to appear offline or ignore messages.

Core quote: “Human Last Exam” is not a joke—it contains many obscure retrieval tasks, and scoring 45% essentially “beats” all existing AI models.

Core Question 1: Should group messages be stored once or duplicated for each member?

Answer: Store a single copy and assign a message queue to each group; duplicating data would create massive redundancy and is unsuitable.

Core Question 2: If only one copy exists, how can we know which members have read which messages?

Answer: Use the partial order of messages to record each member’s last_ack_msgid (or last_ack_time). Messages before this ID are considered read, and those after are unread, requiring only a single value per user.

This approach yields a simple core data structure for group messaging, enabling efficient read‑receipt tracking without excessive storage overhead.

Artificial Intelligencebackend developmentsystem designMessagingRead Receipts
Tencent Cloud Developer
Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.