How BGE’s New Code and Multimodal Vector Models Set New Retrieval Benchmarks
The article introduces three BGE vector models—BGE‑Code‑v1, BGE‑VL‑v1.5, and BGE‑VL‑Screenshot—detailing their architectures, open‑source resources, benchmark results on CoIR, Code‑RAG, MMEB, and MVRB, and their impact on code and multimodal retrieval research.
BGE-Code-v1: A Next‑Generation Code Embedding Model
Built on the Qwen2.5‑Coder‑1.5B base, BGE‑Code‑v1 targets code‑centric retrieval tasks while retaining strong multilingual text understanding. It is trained on the CoIR corpus and large synthetic code‑text pairs, supplemented with curriculum learning and retrieval/STS data from BGE‑gemma2‑multilingual.
Model URL: https://huggingface.co/BAAI/bge-code-v1 Project repository:
https://github.com/FlagOpen/FlagEmbedding/tree/master/research/BGE_CoderPaper: https://arxiv.org/abs/2505.12697 On the CoIR benchmark (covering 14 programming languages and 8 sub‑tasks) and CodeRAG‑Bench, BGE‑Code‑v1 outperforms commercial and open‑source rivals such as Google, Voyage AI, Salesforce, and Jina, achieving state‑of‑the‑art (SOTA) scores.
BGE-VL-v1.5: General Multimodal Retrieval Model
Derived from LLaVA‑1.6 (7.57 B parameters), BGE‑VL‑v1.5 enhances image‑text understanding and retrieval capability. Training combines 3 M image‑caption pairs from MegaPairs with an additional 1 M natural and synthetic samples covering image captioning, visual QA, and classification.
Model URL: https://huggingface.co/BAAI/BGE-VL-v1.5-zs Project repository:
https://github.com/FlagOpen/FlagEmbedding/tree/master/research/BGE_VLPaper: https://arxiv.org/abs/2412.14475 Zero‑shot evaluation on the MMEB benchmark shows BGE‑VL‑v1.5‑zs achieving the best zero‑shot performance, while the fine‑tuned BGE‑VL‑v1.5‑MMEB reaches a 72.16 score, topping the leaderboard across image retrieval, multimodal matching, and cross‑modal recommendation tasks.
BGE-VL‑Screenshot: Visual Document Embedding Model
Based on Qwen2.5‑VL‑3B‑Instruct, this model is trained on seven data sources (news, e‑commerce, papers, documents, project pages, etc.), amassing over 13 M screenshots and 7 M annotated screenshot‑QA pairs. It targets Vis‑IR scenarios where images, text, and graphical elements coexist.
Model URL: https://huggingface.co/BAAI/BGE-VL-Screenshot Project repository:
https://github.com/FlagOpen/FlagEmbedding/tree/master/research/BGE_VL_ScreenshotPaper: https://arxiv.org/abs/2502.11431 On the newly introduced MVRB benchmark (20 datasets covering screenshot retrieval, composite screenshot retrieval, screenshot QA, and open‑category classification), BGE‑VL‑Screenshot attains a 60.61 overall score, setting a new SOTA and demonstrating strong multilingual performance after limited query‑to‑screenshot training.
Overall Impact and Availability
The three models are fully open‑source and have collectively amassed over 600 million downloads, becoming the first Chinese model to top Hugging Face’s leaderboard and the most downloaded model of 2023. They are widely adopted in Retrieval‑Augmented Generation (RAG), neural search, and other AI‑driven retrieval pipelines.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
