Hy3 Preview: First Post‑Rebuild Model with Dramatically Boosted Agent Capabilities
Tencent releases and open‑sources Hy3 preview, a 295‑billion‑parameter mixed‑expert LLM supporting 256K context, built on rebuilt pre‑training and RL infrastructure and guided by three principles—systematic capability, authentic evaluation, and cost efficiency—delivering strong gains in complex reasoning, context learning, code and agent tasks, and is already deployed across multiple Tencent products.
Model Overview
Hy3 preview is a mixed‑expert large language model with 295 B total parameters , 21 B activation parameters and a maximum context length of 256 K tokens . The model is the first released after a complete reconstruction of the Hy series infrastructure.
Infrastructure Reconstruction and Design Principles
In February the pre‑training and reinforcement‑learning pipelines were rebuilt. The development team defined three guiding principles for practical usefulness:
Ability systematization : avoid specialization by ensuring deep collaboration among reasoning, long‑text handling, instruction following, dialogue, code, and tool use.
Authentic evaluation : supplement public leaderboards with internal test sets, recent exams, human assessments and product‑level crowdsourced testing.
Cost‑performance focus : co‑design model architecture and inference framework to dramatically lower task costs.
Benchmark Performance
Complex Reasoning
Hy3 preview achieves strong results on high‑difficulty scientific benchmarks, including FrontierScience Olympiad , IMO Answer Bench , Tsinghua University’s spring math exam (2026) and the national high‑school biology competition (CHSBO 2025), demonstrating generalized reasoning strength.
Context Learning and Instruction Following
Newly introduced CL‑bench and CL‑bench‑Life evaluate the model’s ability to handle noisy, long contexts and obey complex, changing rules. Hy3 preview shows significant gains over previous generations.
Code Generation and Agent Capabilities
Rebuilt training framework and larger RL task scale lead to competitive results on major code‑agent benchmarks such as SWE‑Bench Verified and Terminal‑Bench 2.0 , as well as search‑agent benchmarks BrowseComp and WideSearch . Comprehensive agent evaluations ( ClawEval , WildClawBench ) confirm practical utility in complex agent workflows.
Internal Evaluation Suites
Additional internal benchmarks – Hy‑Backend , Hy‑Vibe Bench and the high‑difficulty software‑engineering suite Hy‑SWE Max – show strong competitiveness across backend engineering, user‑interaction and challenging software development tasks.
Performance Metrics in Production
Latency and success‑rate measurements on internal products report:
First‑token latency reduced by 54 % and end‑to‑end latency reduced by 47 %.
Success rate exceeding 99.99 %.
Stable execution of agent workflows up to 495 steps.
Open‑Source Release and Inference Support
Model weights and code are publicly released on the following platforms:
GitHub: https://github.com/Tencent-Hunyuan/Hy3-preview
Hugging Face: https://huggingface.co/tencent/Hy3-preview
ModelScope: https://modelscope.cn/models/Tencent-Hunyuan/Hy3-preview
GitCode: https://ai.gitcode.com/tencent_hunyuan/Hy3-previewThe model is compatible with major inference engines such as vLLM and SGLang . Architecture, operator and quantization optimizations reduce inference cost compared with the previous generation.
API Pricing
On Tencent Cloud a competitive API pricing and customizable Token Plan are offered; the personal tier starts at ¥28 per month.
Known Issues and Future Work
The team acknowledges remaining issues and invites community feedback to guide the upcoming official release. Ongoing efforts focus on scaling pre‑training and reinforcement‑learning data to further improve the model’s intelligence ceiling.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
