Artificial Intelligence 4 min read

Grok 4’s HLE Breakthrough & Why Group Message Read Receipts Are So Hard

This article examines Grok 4’s impressive 45% HLE score—nearly double Gemini 2.5 Pro—explaining its significance in AI evaluation, and then delves into the technical challenges of implementing reliable read‑receipt mechanisms for group messages, covering data storage and acknowledgment strategies.

Tencent Cloud Developer

Jul 16, 2025

Grok 4’s HLE Breakthrough & Why Group Message Read Receipts Are So Hard

01 Grok 4 Latest Technical Evaluation and Release Guide

Introduction: Elon Musk skipped Grok 3.5 and launched Grok 4, aiming to go live after July 4 with a focus on programming model optimization, prompting the question of whether this extreme iteration can turn the AI arms race in his favor.

Grok 4 achieved an astonishing 45% score on the HLE benchmark, almost twice the performance of Gemini 2.5 Pro. Since HLE is a free‑response test where random guessing yields only about 5% accuracy, each percentage point improvement is extremely difficult.

This result means that Grok 4 can handle many obscure information‑retrieval tasks that define the “Human Last Exam,” effectively outperforming all current AI models on this challenging benchmark.

02 Why Group Message Read Receipts Are So Hard

Introduction: In personal social apps like WeChat, users want to know if their messages are read, but the platform lacks built‑in read‑receipt features. Business tools like DingTalk enforce read receipts, eliminating the ability to appear offline or ignore messages.

Core quote: “Human Last Exam” is not a joke—it contains many obscure retrieval tasks, and scoring 45% essentially “beats” all existing AI models.

Core Question 1: Should group messages be stored once or duplicated for each member?

Answer: Store a single copy and assign a message queue to each group; duplicating data would create massive redundancy and is unsuitable.

Core Question 2: If only one copy exists, how can we know which members have read which messages?

Answer: Use the partial order of messages to record each member’s last_ack_msgid (or last_ack_time). Messages before this ID are considered read, and those after are unread, requiring only a single value per user.

This approach yields a simple core data structure for group messaging, enabling efficient read‑receipt tracking without excessive storage overhead.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Artificial Intelligence Backend Development system design messaging Read Receipts

Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

01 Grok 4 Latest Technical Evaluation and Release Guide

02 Why Group Message Read Receipts Are So Hard

Tencent Cloud Developer

How this landed with the community

Was this worth your time?

0 Comments

01 Grok 4 Latest Technical Evaluation and Release Guide