MindLab Unveils 749B Agent-Optimized Macaron‑V1‑Preview Model

MindLab released the 749B‑parameter Macaron‑V1‑Preview, a model engineered for deep Agent‑Harness post‑training that was trained on fewer than 300 GPUs at less than 1% of the compute cost of peer models and achieves SOTA results on multiple Agent‑centric benchmarks such as LivingBench, VitaBench and PinchBench.

Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
MindLab Unveils 749B Agent-Optimized Macaron‑V1‑Preview Model

Over the past month, post‑training has emerged as a pivotal engine for boosting large‑model capabilities. In this context, the previously low‑profile Mind Lab open‑sourced Macaron‑V1‑Preview, a 749B‑parameter model (744 + 5 B) built on GLM 5.1 with a 40 B activation size and specifically tuned for deep Agent‑Harness scenarios.

The model was developed using fewer than 300 GPU cards, most of which were not the latest Nvidia chips, resulting in a training compute cost that is under 1% of that required by comparable models of similar scale.

Benchmark evaluations show Macaron‑V1‑Preview achieving state‑of‑the‑art performance across a range of Agent‑focused tasks: it tops LivingBench and VitaBench, matches or exceeds open‑source SOTA on the A2UI protocol and PinchBench, and scores 92.5 on the OpenClaw‑oriented PinchBench, making it the best‑performing open‑source model on that metric.

Technically, the model integrates recent advances such as dense LoRA updates, Agent‑Harness orchestration, and the generative UI (A2UI) protocol. It introduces a native Mixture‑of‑LoRA architecture that allows multiple LoRA adapters to coexist on a single base model, enabling multi‑serve capabilities, dynamic routing, and efficient task‑specific adaptation without retraining the full model.

Mind Lab also released the MinT (MindLab Toolkit) infrastructure, which manages millions of LoRA adapters, reduces model loading latency by nearly tenfold, and supports high‑throughput reinforcement‑learning‑driven post‑training. The system leverages DSA sparse attention and MTP multi‑token prediction to keep training and inference costs low even at the 750B scale.

In a companion 43‑page paper titled “On the Scaling of PEFT,” the team presents a scaling law showing that the accuracy of collaborative model decisions grows logarithmically with the number of participating models, highlighting the benefits of a large population of diverse LoRA adapters for collective intelligence.

Overall, Macaron‑V1‑Preview demonstrates how efficient infrastructure, Mixture‑of‑LoRA, and continuous learning can together enable large‑scale Agent models to evolve rapidly in real‑world settings, marking a practical validation of a new post‑training paradigm.

image
image
image
image
image
image
image
image
image
image
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

LoRALarge Language ModelbenchmarkEfficient TrainingAgent HarnessMixture-of-LoRAMacaron-V1-Preview
Machine Learning Algorithms & Natural Language Processing
Written by

Machine Learning Algorithms & Natural Language Processing

Focused on frontier AI technologies, empowering AI researchers' progress.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.