Google DeepMind Open‑Sources TIPSv2: State‑of‑the‑Art Patch‑Text Alignment at CVPR 2026
The DeepMind team unveils TIPSv2, a vision‑language pre‑training model that dramatically improves patch‑level image‑text alignment through iBOT++, Head‑only EMA, and multi‑granularity captions, achieving record‑breaking results on nine tasks across twenty datasets while remaining fully open‑source.
