How to Build a Multi‑Repo Semantic Code Q&A System with OpenViking

This guide explains the challenges of multi‑repository code retrieval, presents an experimental evaluation of OpenViking's semantic search, and provides step‑by‑step instructions for installing, configuring, importing repositories, and integrating the system into AI agents and chatbots.

ByteDance SE Lab
ByteDance SE Lab
ByteDance SE Lab
How to Build a Multi‑Repo Semantic Code Q&A System with OpenViking

Background and Challenges

Large enterprises and complex open‑source projects often split code across dozens or hundreds of independent Git repositories. This modularity creates three main problems for developers who need to understand or query code:

Missing context : An AI assistant that sees only the current repository cannot resolve cross‑repo calls and dependencies.

Inefficient semantic search : Traditional grep or glob rely on exact keyword matches and cannot capture the intent behind concepts such as "user authentication logic" that may be scattered across AuthService, verify_token, or user_session.

Information overload : Frequently occurring tokens (e.g., request) generate noisy results across many repositories, making it hard to locate the relevant code.

Solution Overview

OpenViking is a private, multi‑repo semantic code‑question‑answering system. It aggregates arbitrary numbers of public (GitHub) or local repositories, automatically analyses, summarises, and vectorises the code to build a deep semantic index, and exposes a ov CLI that can be used as a skill or plugin by any AI agent for cross‑repo retrieval.

Experimental Evaluation

A real‑world evaluation used 157 repositories and 10 representative questions. Three groups were compared using the same GLM‑4.7 model:

Control group : Direct local workspace search via OpenCode.

Experiment 1 : Semantic search through OpenCode with the OpenViking plugin.

Experiment 2 : Native VikingBot built on OpenViking.

Good‑rating percentages were:

Control: 40 % good, 30 % average, 30 % poor.

Experiment 1: 80 % good, 10 % average, 10 % poor.

Experiment 2: 90 % good, 10 % average, 0 % poor. The semantic approach dramatically increased the proportion of good answers and eliminated poor outcomes.

Cost Estimation

Initial repository parsing consumes about 539 M tokens (≈300 M for embeddings, 239 M for VLM processing). Ongoing daily usage incurs token costs per query; the exact cost depends on query volume.

Installation

pip install openviking

Verify the installation:

ov --version

Server Configuration

Create ~/.openviking/ov.conf (JSON) with the following structure:

{
  "server": {
    "host": "127.0.0.1",
    "port": 1933,
    "root_api_key": "{your-key}",
    "cors_origins": ["*"]
  },
  "storage": {
    "workspace": "{your-data-dir}"
  },
  "embedding": {
    "dense": {
      "model": "{your-embedding-model}",
      "api_key": "{your-api-key}",
      "api_base": "{your-api-endpoint}",
      "dimension": 1024,
      "provider": "{your-provider}"
    }
  },
  "vlm": {
    "model": "{your-vlm-model}",
    "api_key": "{your-api-key}",
    "api_base": "{your-api-endpoint}",
    "provider": "{your-provider}"
  },
  "log": {
    "level": "INFO"
  }
}

Create the CLI configuration ~/.openviking/ovcli.conf :

{
  "url": "http://127.0.0.1:1933",
  "api_key": "{your-key}",
  "timeout": 60.0
}

Starting the Server

# Default configuration
openviking-server

# Custom configuration file
openviking-server --config /path/to/ov.conf

# Custom port
openviking-server --port 8000

# Run in background
nohup openviking-server > /data/log/openviking.log 2>&1 &

Check health: ov system health Expected output: {"status":"ok"}

Importing Multiple Repositories

Use ov add-resource to import code from a GitHub URL or a local directory.

# Import a public GitHub repository
ov add-resource https://github.com/volcengine/OpenViking.git \
  --to viking://resources/volcengine/OpenViking --wait
# Import a local project
ov add-resource /path/to/my-project \
  --to viking://resources/internal/my-project --wait

Organise resources under viking://resources/ with meaningful sub‑directories (e.g., backend , frontend , internal , public ) to improve scoped retrieval. For large repositories, extend the waiting period with --timeout (seconds). Enable periodic incremental updates with --watch-interval (seconds). A positive value registers a recurring update task; a non‑positive value removes it.

# Register hourly incremental updates
ov add-resource https://github.com/volcengine/OpenViking.git \
  --to viking://resources/volcengine/OpenViking --watch-interval 3600

Agent Integration

Register OpenViking as a skill/plugin for your AI agent (e.g., OpenCode) by adding the plugin name to the agent’s configuration and restarting the agent: {"plugin": ["openviking-opencode"]} During a query, use ov find or ov search for semantic retrieval. If no result is found, fall back to local file‑system tools.

Optional Chatbot Integration (Feishu/Lark)

Add bot credentials to the server configuration:

{
  "bot": {
    "channels": [
      {
        "type": "feishu",
        "enabled": true,
        "appId": "{your-app-id}",
        "appSecret": "{your-app-secret}",
        "threadRequireMention": true
      }
    ]
  }
}

Start the server together with the bot: openviking-server --with-bot After deployment, mention the bot in a Feishu group to ask any code‑base question.

OpenViking Plugin 2.0 Upgrade

OpenViking Plugin 2.0 is built on the OpenClaw ContextEngine and requires OpenClaw >= v2026.3.7. It replaces the older memory-openviking plugin (compatible only with OpenClaw 2.10.x – 2026.3.6). The new plugin provides simplified installation, built‑in virtual‑environment setup, and more comprehensive verification steps.

evaluationAI assistantmulti-repoOpenVikingplugin upgradesemantic code search
ByteDance SE Lab
Written by

ByteDance SE Lab

Official account of ByteDance SE Lab, sharing research and practical experience in software engineering. Our lab unites researchers and engineers from various domains to accelerate the fusion of software engineering and AI, driving technological progress in every phase of software development.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.