Grok 4’s HLE Breakthrough & Why Group Message Read Receipts Are So Hard
This article examines Grok 4’s impressive 45% HLE score—nearly double Gemini 2.5 Pro—explaining its significance in AI evaluation, and then delves into the technical challenges of implementing reliable read‑receipt mechanisms for group messages, covering data storage and acknowledgment strategies.
01 Grok 4 Latest Technical Evaluation and Release Guide
Introduction: Elon Musk skipped Grok 3.5 and launched Grok 4, aiming to go live after July 4 with a focus on programming model optimization, prompting the question of whether this extreme iteration can turn the AI arms race in his favor.
Grok 4 achieved an astonishing 45% score on the HLE benchmark, almost twice the performance of Gemini 2.5 Pro. Since HLE is a free‑response test where random guessing yields only about 5% accuracy, each percentage point improvement is extremely difficult.
This result means that Grok 4 can handle many obscure information‑retrieval tasks that define the “Human Last Exam,” effectively outperforming all current AI models on this challenging benchmark.
02 Why Group Message Read Receipts Are So Hard
Introduction: In personal social apps like WeChat, users want to know if their messages are read, but the platform lacks built‑in read‑receipt features. Business tools like DingTalk enforce read receipts, eliminating the ability to appear offline or ignore messages.
Core quote: “Human Last Exam” is not a joke—it contains many obscure retrieval tasks, and scoring 45% essentially “beats” all existing AI models.
Core Question 1: Should group messages be stored once or duplicated for each member?
Answer: Store a single copy and assign a message queue to each group; duplicating data would create massive redundancy and is unsuitable.
Core Question 2: If only one copy exists, how can we know which members have read which messages?
Answer: Use the partial order of messages to record each member’s last_ack_msgid (or last_ack_time). Messages before this ID are considered read, and those after are unread, requiring only a single value per user.
This approach yields a simple core data structure for group messaging, enabling efficient read‑receipt tracking without excessive storage overhead.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
