Tagged articles

native-model

1 articles · Page 1 of 1
Machine Heart
Machine Heart
Jun 24, 2026 · Artificial Intelligence

From Pixels to Words: A Native Vision-Language Model Unifies Images and Video

The paper introduces NEO‑ov, a native vision‑language model that discards external visual encoders, feeding raw pixels directly into a unified transformer, and demonstrates competitive performance on image, multi‑image, and video tasks—including fine‑grained perception and spatial reasoning—while outlining its three‑stage training pipeline and current limitations.

Qwenbenchmarkmultimodal
0 likes · 13 min read
From Pixels to Words: A Native Vision-Language Model Unifies Images and Video