How VIDEX Brings Virtual Indexes to MySQL for Faster Query Optimization
VIDEX is an open‑source MySQL virtual‑index framework that enables cost‑based what‑if analysis without building real indexes, integrates with AI‑driven NDV and cardinality estimators, supports modular deployment, and has been validated on TPC‑H benchmarks to closely match InnoDB query plans.
Virtual Index Technology
Virtual index (hypothetical index) plays a key role in query optimization and index recommendation. It allows “what‑if” analysis without building real indexes, using only statistics.
In the AI era, machine‑learning based NDV and cardinality estimators are hard to inject into MySQL’s optimizer, limiting index recommendations.
VIDEX – Open‑source MySQL Virtual Index
ByteDance ByteBrain team open‑sourced VIDEX, bringing virtual index capability to MySQL. The project was accepted as a VLDB25 demo.
VIDEX is deployed in large‑scale production at ByteDance, serving thousands of users and hundreds of thousands of slow SQLs daily, and has received praise from industry experts.
Key Contributions
Filling MySQL’s virtual‑index gap : First open‑source, extensible, multi‑deployment MySQL virtual‑index solution.
Accurate MySQL cost modeling : Tested on TPC‑H, TPC‑H‑Skew, JOB benchmarks; with exact NDV and cardinality it can simulate InnoDB plans 100%.
Modular, multi‑deployment architecture : Separate optimizer plugin and statistic server; can run as a plugin or independent service, supporting hot‑updates.
Extensible experimental platform : Provides a clean JSON interface for researchers to plug in new NDV/cardinality algorithms in any language.
Deployment Modes
Install VIDEX‑Optimizer as a plugin in an existing MySQL instance.
Start an independent VIDEX instance that synchronizes statistics and runs analysis without affecting the live database.
Statistic Server Modes
Run together with VIDEX‑Optimizer (default).
Start a standalone statistic server and point the optimizer to it with SET @VIDEX_STATISTIC_SERVER='ip:port'.
Task Modes
Non‑task mode: users only specify source and virtual databases.
Task mode: users set SET @VIDEX_OPTIONS={'task_id':'abc'} to isolate multiple analysis jobs.
Algorithm Integration
Researchers can implement a new model by subclassing VidexModelInnoDB. The abstract base class VidexModelBase defines cardinality and ndv methods.
class VidexModelBase(ABC):
"""Abstract cost model class. VIDEX‑Statistic‑Server receives requests from VIDEX‑Optimizer for Cardinality and NDV estimates, parses them into structured data for ease use of developers.
@abstractmethod
def cardinality(self, idx_range_cond: IndexRangeCond) -> int:
"""Estimates the cardinality (number of rows matching a criteria) for a given index range condition."""
pass
@abstractmethod
def ndv(self, index_name: str, table_name: str, column_list: List[str]) -> int:
"""Estimates the number of distinct values (NDV) for specified fields within an index."""
raise NotImplementedError()Example server start:
startup_videx_server(VidexModelClass=VidexModelExample)Alternatively, any language can implement an HTTP service and be referenced via SET @VIDEX_STATISTIC_SERVER='ip:port'.
Running VIDEX on TPC‑H
Step 1: Launch the provided Docker image containing VIDEX‑Optimizer and VIDEX‑Statistic.
docker run -d -p 13308:13308 -p 5001:5001 --name videx kangrongme/videx:0.0.2Step 2: Prepare the TPCH database and synchronize metadata with the script:
python src/sub_platforms/sql_opt/videx/scripts/videx_build_env.py \
--target 127.0.0.1:13308:tpch_tiny:videx:password \
--videx 127.0.0.1:13308:videx_tpch_tiny:videx:passwordStep 3: Add a test index to both the real InnoDB and the virtual database. Because VIDEX creates a virtual index, the cost is O(1).
-- For InnoDB
ALTER TABLE tpch_tiny.orders ADD INDEX idx_o_orderstatus (o_orderstatus);
-- For VIDEX
ALTER TABLE videx_tpch_tiny.orders ADD INDEX idx_o_orderstatus (o_orderstatus);EXPLAIN output shows that VIDEX and InnoDB produce almost identical plans, and the row‑count error is only 0.56%.
Architecture Overview
VIDEX‑Optimizer‑Plugin : Front‑end that implements MySQL optimizer interfaces and forwards complex requests to the statistic server.
VIDEX‑Statistic‑Server : Back‑end that computes NDV and cardinality from collected statistics and returns results to the optimizer.
When a user selects a real database for “what‑if” analysis, VIDEX creates a virtual database with identical schema but a VIDEX engine, gathers necessary statistics, and allows free creation/deletion of virtual indexes for cost‑based plan simulation.
Volcano Engine Developer Services
The Volcano Engine Developer Community, Volcano Engine's TOD community, connects the platform with developers, offering cutting-edge tech content and diverse events, nurturing a vibrant developer culture, and co-building an open-source ecosystem.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
