4 min read

How Model Fusion Cut LLM Chain‑of‑Thought Length by 40% Without Fine‑Tuning

A small tech firm, tngtech, released an open‑source model fusion called DeepSeek‑R1T‑Chimera that merges R1 inference with V3‑0324 without fine‑tuning, distillation, or prompts, achieving the same intelligence as R1 while reducing token output by 40% and speeding up inference.

Baobao Algorithm Notes

Apr 27, 2025

How Model Fusion Cut LLM Chain‑of‑Thought Length by 40% Without Fine‑Tuning

Chain‑of‑Thought (CoT) length often degrades user experience despite improving reasoning accuracy. To address this, the deployment‑focused team tngtech released an open‑source model fusion called DeepSeek‑R1T‑Chimera . The model weights are available at https://huggingface.co/tngtech/DeepSeek-R1T-Chimera.

This fusion is a true parameter‑level merge: it involves no fine‑tuning, no distillation, and no prompt engineering.

The resulting model preserves the reasoning capability of DeepSeek‑R1 while running faster, producing considerably shorter CoT sequences, and reducing output token count by roughly 40%.

Construction of DeepSeek‑R1T‑Chimera

The method simply combines the inference pathways of the original R1 model with those of the V3‑0324 checkpoint:

V3‑0324’s shared expert is retained, but each of its shared‑expert weight tensors is replaced by the arithmetic mean of the corresponding R1 and V3 tensors, i.e., (R1 + V3) / 2. The exact mixing ratio is still being investigated.

Routed experts from both R1 and V3 are merged, allowing the fused model to route tokens through either expert set during inference.

Despite the parameter averaging, the merged model does not collapse; instead, its inference process becomes more compact and orderly compared with the original R1, which can be verbose.

Benchmark results

Evaluations on suites such as AIME24 and MT‑bench show that the fused model achieves higher throughput and lower latency while generating shorter CoT, with benchmark scores remaining comparable to the original R1.

Implications and next steps

The shorter‑and‑faster model opens the possibility of further inference‑time extensions to raise evaluation baselines. Researchers working on post‑training techniques are encouraged to experiment with this fusion approach and report useful findings.

Artificial Intelligence LLM model fusion DeepSeek

Written by

Baobao Algorithm Notes

Author of the BaiMian large model, offering technology and industry insights.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.