Tencent Technical Engineering
Tencent Technical Engineering
Apr 23, 2026 · Artificial Intelligence

Tencent Hunyuan Launches Hy3 Preview: Open‑Source Model Boosts Agent Performance

On April 23, Tencent released the open‑source Hy3 preview, a 295 B‑parameter hybrid expert model with 21 B active parameters and 256K context length, delivering substantial gains in complex reasoning, instruction following, code and agent tasks, achieving 40 % faster inference, lower costs, and strong benchmark results across Tencent’s AI products.

Hy3-previewInference EfficiencyLarge Language Model
0 likes · 9 min read
Tencent Hunyuan Launches Hy3 Preview: Open‑Source Model Boosts Agent Performance
AntTech
AntTech
Apr 23, 2026 · Artificial Intelligence

Ling-2.6-flash: Faster Response, Stronger Execution, and Higher Token Efficiency for Agent Workloads

Ling-2.6-flash is a 104B‑parameter Instruct model that uses a mixed‑linear architecture and token‑efficiency optimizations to achieve up to 340 tokens/s inference speed, 4× higher throughput than comparable models, and ten‑fold lower token consumption on Agent benchmarks, while maintaining SOTA performance.

Agent OptimizationInference EfficiencyLLM
0 likes · 15 min read
Ling-2.6-flash: Faster Response, Stronger Execution, and Higher Token Efficiency for Agent Workloads
AI Explorer
AI Explorer
Mar 20, 2026 · Artificial Intelligence

Meta Agent Leak Triggers Zuckerberg’s Emergency Response and Signals New AI Strategy

Meta’s internal “Meta Agent” AI project was unexpectedly exposed, revealing a novel deep‑learning architecture focused on inference efficiency and multimodal understanding; the leak has sparked debate over whether it was an accident or a strategic signal in the escalating AI arms race, prompting Zuckerberg to act swiftly.

AIAI competitionInference Efficiency
0 likes · 6 min read
Meta Agent Leak Triggers Zuckerberg’s Emergency Response and Signals New AI Strategy
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Mar 11, 2026 · Artificial Intelligence

Why LLMs Overthink: ICLR2026 Study Reveals the Key Bottleneck in Inference Efficiency

The ICLR2026 paper identifies reasoning miscalibration—overthinking easy steps and underthinking critical ones—as the root cause of runaway LLM inference costs, and proposes the Budget Allocation Model (BAM) and a training‑free Plan‑and‑Budget framework that smartly distributes compute, achieving up to 70% higher accuracy while cutting token usage by 39% and boosting the new E³ efficiency metric by 193.8%.

Budget Allocation ModelE3 MetricEpistemic Uncertainty
0 likes · 12 min read
Why LLMs Overthink: ICLR2026 Study Reveals the Key Bottleneck in Inference Efficiency
SuanNi
SuanNi
Feb 27, 2026 · Artificial Intelligence

Can Deep Thought Ratio Reveal the True Reasoning Power of LLMs?

This article introduces the Deep Thought Ratio (DTR) metric, explains how tracking token modifications across neural network layers quantifies genuine inference effort, and shows through extensive experiments that DTR predicts accuracy far better than token length while enabling a sampling strategy that halves computational cost.

AI metricsInference EfficiencyLLM evaluation
0 likes · 9 min read
Can Deep Thought Ratio Reveal the True Reasoning Power of LLMs?