Machine Heart
Machine Heart
Apr 2, 2026 · Artificial Intelligence

LongCat-Next: Turning Images, Audio, and Text into Tokens – What’s Next?

LongCat-Next is a 68.5‑billion‑parameter discrete‑native autoregressive multimodal model that tokenizes images, audio and text, challenges the belief that visual tokenization loses detail, matches specialized models on fine‑grained tasks, and demonstrates that joint understanding‑generation training can even improve generation quality.

LongCat-Nextaudio synthesisdiscrete tokenization
0 likes · 21 min read
LongCat-Next: Turning Images, Audio, and Text into Tokens – What’s Next?
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Mar 30, 2026 · Artificial Intelligence

Meituan’s Fully Discrete Multimodal Base (LongCat-Next) Shows All Physical Signals Can Converge to Tokens

LongCat-Next, a 3‑billion‑parameter multimodal model released by Meituan, adopts a pure discrete token‑based architecture (DiNA) and next‑token prediction, outperforming same‑size rivals on OmniDocBench‑EN, CharXivRQ, and matching QwenVL on visual tasks, while avoiding catastrophic forgetting and achieving a SWE‑Bench score of 43.0, as demonstrated through extensive benchmarks, receipt extraction, OCR, audio dialect reasoning, and image generation experiments.

DiNALongCat-NextOmniDocBench
0 likes · 10 min read
Meituan’s Fully Discrete Multimodal Base (LongCat-Next) Shows All Physical Signals Can Converge to Tokens