How a Single SQL Query Powers Multi‑Modal AI Search with veDB‑Search
veDB‑Search extends MySQL compatibility to enable AI‑driven hybrid retrieval—combining vector, full‑text, and scalar searches—through a single SQL statement, simplifying architecture, boosting performance, and supporting in‑database embedding for seamless "search everything" capabilities.
veDB MySQL edition is Volcano Engine's cloud‑native relational database that is 100% compatible with native MySQL. Built on this foundation, veDB‑Search adds a one‑stop hybrid retrieval service, allowing users to store and query vector, full‑text, and scalar data using only standard SQL.
Why Traditional Multi‑Path Retrieval Is Painful
Complex architecture : Maintaining separate vector databases, full‑text engines (e.g., Elasticsearch), and relational databases, often with multiple embedding vectors per item.
Data synchronization : Requires intricate ETL or CDC pipelines to keep data consistent across systems.
Heavy business logic : Application code must invoke different SDKs, aggregate, deduplicate, and rank results, leading to tangled, hard‑to‑maintain code.
veDB‑Search’s Breakthrough
One‑SQL multi‑path recall : A single SQL statement can retrieve vector, text, and scalar matches simultaneously.
In‑DB Embedding : Embedding generation is pushed down into the database, making vectorization transparent to the business layer.
Performance guarantees : Built‑in optimizer selects optimal indexes and execution plans; distributed architecture ensures scalability.
Multi‑Path Recall Execution Flow
Parse the standard SQL.
Optimizer automatically chooses the best index and generates an optimal plan.
Scheduler pushes each retrieval path down to the corresponding vector index for filtering.
Scores from each path are fused according to weights and sorted.
Final result set is returned.
SQL Example for Three‑Way Recall
Step 1 – Create a hybrid index on the products table with three vector columns ( emb1, emb2, emb3) and a text column context:
ALTER TABLE products ADD VECTOR INDEX `hybrid_idx`(
`context`,
`emb1`, `emb2`, `emb3`
) SECONDARY_ENGINE_ATTRIBUTE = '{
"distance": "cosine",
"scalar_fields": "category, discount, price"
}';Step 2 – Execute a single SQL that performs:
Text‑vector top‑k + full‑text match on context.
Image‑vector top‑k with price and discount filters.
Detailed‑image vector top‑k.
Result fusion using RRF (Reciprocal Rank Fusion) or raw similarity scores.
WITH
description_recall AS (
SELECT id, name, 2.0 AS score, 'description' AS recall_type
FROM products
WHERE MATCH(context) AGAINST('户外远足装备')
ORDER BY similarity(emb1, [0.1,0.2,0.3,0.4]) DESC
LIMIT 50
),
hot_item_recall AS (
SELECT id, name, 1.5 AS score, 'item' AS recall_type
FROM products
WHERE price BETWEEN 200 AND 400 AND discount > 0.10
ORDER BY similarity(emb2, [0.1,0.2,0.3,0.4]) DESC
LIMIT 50
),
detailed_recall AS (
SELECT id, name, 0.8 AS score, 'detailed_item' AS recall_type
FROM products
ORDER BY similarity(emb3, [0.1,0.2,0.3,0.4]) DESC
LIMIT 50
),
combined AS (
SELECT * FROM description_recall
UNION ALL SELECT * FROM hot_item_recall
UNION ALL SELECT * FROM detailed_recall
)
SELECT id, name, score, recall_type
FROM combined
ORDER BY score DESC
LIMIT 100;RRF (Reciprocal Rank Fusion) Weighting
veDB‑Search supports RRF to combine scores from multiple recall paths:
RRF_score = (1 / (k + rank_in_list1)) + (1 / (k + rank_in_list2)) + ...Typical k is ≥ 60. This method balances the strengths of each path, improving relevance.
Filtering with Standard SQL WHERE
Complex filters can be expressed directly in SQL, and the optimizer decides whether to apply per‑filter or post‑filter strategies. Example:
WHERE MATCH(context) AGAINST('户外远足装备')
AND (price BETWEEN 200 AND 400 OR discount > 0.10)In‑DB Embedding Functions
Two functions enable embedding generation inside the database: to_embedding(model_id, model_key, user_input, model_type) – explicit call that returns a VECTOR column. embedding_pipeline(user_input, model_type) – defined in a hybrid index; the engine automatically generates embeddings during query execution.
Supported model_type values are content (text), image (image URL), and text (multimodal text).
Example: Text‑to‑Embedding and Storage
SELECT to_embedding('doubao-embedding-large', '$model_key', '三合一冲锋衣', 'content') AS title_vec;Creating a table that stores the embedding:
CREATE TABLE `document_asset` (
`id` BIGINT PRIMARY KEY AUTO_INCREMENT COMMENT '主键ID,自增唯一标识',
`content_vec` VECTOR(4096) NOT NULL COMMENT '文本内容,用于向量化',
ANN KEY `idx`(`content_vec`)
) ENGINE=InnoDB COMMENT='文档表';Embedding Pipeline Example for an E‑commerce Product Table
CREATE TABLE `products` (
`id` BIGINT PRIMARY KEY AUTO_INCREMENT COMMENT '主键ID',
`image_url` MEDIUMTEXT NOT NULL COMMENT '图片的url地址',
`type` INT NOT NULL DEFAULT 0 COMMENT '商品类型',
ANN KEY `idx`(`image_url`) SECONDARY_ENGINE_ATTRIBUTE = '{
"scalar_fields": "type",
"distance": "cosine",
"emb_pipeline": [{
"col_name": "image_url",
"model_id": "$model_id",
"auth_key": "$model_key",
"type": "image"
}]
}' ENGINE=InnoDB COMMENT='选品向量表';Querying top‑2 similar images without storing embeddings:
SELECT id FROM products
ORDER BY l2_distance(embedding_pipeline('https://example.com/iphone15.jpg', 'image'), image_url)
LIMIT 2;Practical E‑commerce Cases
Case 1 – Text‑to‑Image Search : Users type "三合一冲锋衣"; the system retrieves matching product images using the embedding pipeline and scalar filters (sales, price).
Case 2 – Image‑to‑Image Recommendation : Users upload a product photo; the system finds visually similar items, then re‑ranks by sales and price.
Best‑Practice Tips
Design recall paths based on core business dimensions (e.g., product image, description, tags).
Iteratively tune path weights via A/B testing to maximize relevance.
Include both embedding columns and frequently filtered scalar columns in the hybrid index for optimal query planning and elastic scaling.
veDB‑Search transforms a traditional relational database into an AI‑aware cognition engine, enabling "search everything" with a single SQL statement while dramatically reducing development and operational overhead.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Volcano Engine Developer Services
The Volcano Engine Developer Community, Volcano Engine's TOD community, connects the platform with developers, offering cutting-edge tech content and diverse events, nurturing a vibrant developer culture, and co-building an open-source ecosystem.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
