8 min read

New BGE Vector Models Set SOTA in Code and Multimodal Retrieval – What Makes Them So Powerful?

Three newly released BGE vector models—BGE‑Code‑v1, BGE‑VL‑v1.5, and BGE‑VL‑Screenshot—deliver state‑of‑the‑art performance on code, multimodal, and visual document retrieval benchmarks, are open‑source on Hugging Face and GitHub, and aim to boost retrieval‑augmented applications across languages and modalities.

AI Frontier Lectures

May 21, 2025

New BGE Vector Models Set SOTA in Code and Multimodal Retrieval – What Makes Them So Powerful?

Retrieval‑augmented generation (RAG) and multimodal search rely on high‑quality vector encoders. Three new open‑source models have been released to cover code, general multimodal, and visual‑document retrieval scenarios.

BGE‑Code‑v1

This encoder is built on the Qwen2.5‑Coder‑1.5B backbone. Training combines the CoIR benchmark dataset with a large corpus of synthetic code‑text pairs, using curriculum learning. Additional retrieval and semantic‑textual similarity (STS) data from BGE‑gemma2‑multilingual are incorporated as auxiliary tasks. The model excels at code‑document search, cross‑language code retrieval, and outperforms commercial and open‑source baselines on both CoIR and CodeRAG‑Bench.

Model hub: https://huggingface.co/BAAI/bge-code-v1 GitHub repository:

https://github.com/FlagOpen/FlagEmbedding/tree/master/research/BGE_Coder

Paper:

https://arxiv.org/abs/2505.12697

BGE‑VL‑v1.5

The multimodal encoder is based on LLaVA‑1.6 (7.57 B parameters). It is trained on 3 M image‑caption pairs from the MegaPairs collection and an additional 1 M natural‑plus‑synthetic samples covering image captioning, visual‑question‑answering, and image classification. This curriculum yields strong zero‑shot performance on the MMEB benchmark and a fine‑tuned state‑of‑the‑art score of 72.16.

Model hub: https://huggingface.co/BAAI/BGE-VL-v1.5-zs GitHub repository:

https://github.com/FlagOpen/FlagEmbedding/tree/master/research/BGE_VL

Paper:

https://arxiv.org/abs/2412.14475

BGE‑VL‑Screenshot

This visual‑document encoder derives from Qwen2.5‑VL‑3B‑Instruct . Training data consist of more than 13 M screenshots and 7 M caption‑question pairs collected from news sites, e‑commerce platforms, academic papers, and project homepages. Evaluation uses the newly introduced MVRB benchmark (20 datasets, 4 tasks). The model achieves a combined score of 60.61, establishing a new SOTA and demonstrating multilingual capability beyond English.

Model hub: https://huggingface.co/BAAI/BGE-VL-Screenshot GitHub repository:

https://github.com/FlagOpen/FlagEmbedding/tree/master/research/BGE_VL_Screenshot

Paper: https://arxiv.org/abs/2502.11431 MVRB leaderboard: https://huggingface.co/spaces/BAAI/MVRB_leaderboard All three models are fully open‑source and provide a one‑stop solution for efficient vector representation and semantic search across code, text, and visual documents.

vector retrieval open-source AI Models code search BGE multimodal embeddings

Written by

AI Frontier Lectures

Leading AI knowledge platform

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.