How to Build a Multi‑Repo Semantic Code Q&A System with OpenViking
This guide explains the challenges of multi‑repository code retrieval, presents an experimental evaluation of OpenViking's semantic search, and provides step‑by‑step instructions for installing, configuring, importing repositories, and integrating the system into AI agents and chatbots.
Background and Challenges
Large enterprises and complex open‑source projects often split code across dozens or hundreds of independent Git repositories. This modularity creates three main problems for developers who need to understand or query code:
Missing context : An AI assistant that sees only the current repository cannot resolve cross‑repo calls and dependencies.
Inefficient semantic search : Traditional grep or glob rely on exact keyword matches and cannot capture the intent behind concepts such as "user authentication logic" that may be scattered across AuthService, verify_token, or user_session.
Information overload : Frequently occurring tokens (e.g., request) generate noisy results across many repositories, making it hard to locate the relevant code.
Solution Overview
OpenViking is a private, multi‑repo semantic code‑question‑answering system. It aggregates arbitrary numbers of public (GitHub) or local repositories, automatically analyses, summarises, and vectorises the code to build a deep semantic index, and exposes a ov CLI that can be used as a skill or plugin by any AI agent for cross‑repo retrieval.
Experimental Evaluation
A real‑world evaluation used 157 repositories and 10 representative questions. Three groups were compared using the same GLM‑4.7 model:
Control group : Direct local workspace search via OpenCode.
Experiment 1 : Semantic search through OpenCode with the OpenViking plugin.
Experiment 2 : Native VikingBot built on OpenViking.
Good‑rating percentages were:
Control: 40 % good, 30 % average, 30 % poor.
Experiment 1: 80 % good, 10 % average, 10 % poor.
Experiment 2: 90 % good, 10 % average, 0 % poor. The semantic approach dramatically increased the proportion of good answers and eliminated poor outcomes.
Cost Estimation
Initial repository parsing consumes about 539 M tokens (≈300 M for embeddings, 239 M for VLM processing). Ongoing daily usage incurs token costs per query; the exact cost depends on query volume.
Installation
pip install openvikingVerify the installation:
ov --versionServer Configuration
Create ~/.openviking/ov.conf (JSON) with the following structure:
{
"server": {
"host": "127.0.0.1",
"port": 1933,
"root_api_key": "{your-key}",
"cors_origins": ["*"]
},
"storage": {
"workspace": "{your-data-dir}"
},
"embedding": {
"dense": {
"model": "{your-embedding-model}",
"api_key": "{your-api-key}",
"api_base": "{your-api-endpoint}",
"dimension": 1024,
"provider": "{your-provider}"
}
},
"vlm": {
"model": "{your-vlm-model}",
"api_key": "{your-api-key}",
"api_base": "{your-api-endpoint}",
"provider": "{your-provider}"
},
"log": {
"level": "INFO"
}
}Create the CLI configuration ~/.openviking/ovcli.conf :
{
"url": "http://127.0.0.1:1933",
"api_key": "{your-key}",
"timeout": 60.0
}Starting the Server
# Default configuration
openviking-server
# Custom configuration file
openviking-server --config /path/to/ov.conf
# Custom port
openviking-server --port 8000
# Run in background
nohup openviking-server > /data/log/openviking.log 2>&1 &Check health: ov system health Expected output: {"status":"ok"}
Importing Multiple Repositories
Use ov add-resource to import code from a GitHub URL or a local directory.
# Import a public GitHub repository
ov add-resource https://github.com/volcengine/OpenViking.git \
--to viking://resources/volcengine/OpenViking --wait # Import a local project
ov add-resource /path/to/my-project \
--to viking://resources/internal/my-project --waitOrganise resources under viking://resources/ with meaningful sub‑directories (e.g., backend , frontend , internal , public ) to improve scoped retrieval. For large repositories, extend the waiting period with --timeout (seconds). Enable periodic incremental updates with --watch-interval (seconds). A positive value registers a recurring update task; a non‑positive value removes it.
# Register hourly incremental updates
ov add-resource https://github.com/volcengine/OpenViking.git \
--to viking://resources/volcengine/OpenViking --watch-interval 3600Agent Integration
Register OpenViking as a skill/plugin for your AI agent (e.g., OpenCode) by adding the plugin name to the agent’s configuration and restarting the agent: {"plugin": ["openviking-opencode"]} During a query, use ov find or ov search for semantic retrieval. If no result is found, fall back to local file‑system tools.
Optional Chatbot Integration (Feishu/Lark)
Add bot credentials to the server configuration:
{
"bot": {
"channels": [
{
"type": "feishu",
"enabled": true,
"appId": "{your-app-id}",
"appSecret": "{your-app-secret}",
"threadRequireMention": true
}
]
}
}Start the server together with the bot: openviking-server --with-bot After deployment, mention the bot in a Feishu group to ask any code‑base question.
OpenViking Plugin 2.0 Upgrade
OpenViking Plugin 2.0 is built on the OpenClaw ContextEngine and requires OpenClaw >= v2026.3.7. It replaces the older memory-openviking plugin (compatible only with OpenClaw 2.10.x – 2026.3.6). The new plugin provides simplified installation, built‑in virtual‑environment setup, and more comprehensive verification steps.
ByteDance SE Lab
Official account of ByteDance SE Lab, sharing research and practical experience in software engineering. Our lab unites researchers and engineers from various domains to accelerate the fusion of software engineering and AI, driving technological progress in every phase of software development.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
