How a Single SQL Query Powers Multi‑Modal AI Search with veDB‑Search

veDB‑Search extends MySQL compatibility to enable AI‑driven hybrid retrieval—combining vector, full‑text, and scalar searches—through a single SQL statement, simplifying architecture, boosting performance, and supporting in‑database embedding for seamless "search everything" capabilities.

Volcano Engine Developer Services
Volcano Engine Developer Services
Volcano Engine Developer Services
How a Single SQL Query Powers Multi‑Modal AI Search with veDB‑Search

veDB MySQL edition is Volcano Engine's cloud‑native relational database that is 100% compatible with native MySQL. Built on this foundation, veDB‑Search adds a one‑stop hybrid retrieval service, allowing users to store and query vector, full‑text, and scalar data using only standard SQL.

Why Traditional Multi‑Path Retrieval Is Painful

Complex architecture : Maintaining separate vector databases, full‑text engines (e.g., Elasticsearch), and relational databases, often with multiple embedding vectors per item.

Data synchronization : Requires intricate ETL or CDC pipelines to keep data consistent across systems.

Heavy business logic : Application code must invoke different SDKs, aggregate, deduplicate, and rank results, leading to tangled, hard‑to‑maintain code.

veDB‑Search’s Breakthrough

One‑SQL multi‑path recall : A single SQL statement can retrieve vector, text, and scalar matches simultaneously.

In‑DB Embedding : Embedding generation is pushed down into the database, making vectorization transparent to the business layer.

Performance guarantees : Built‑in optimizer selects optimal indexes and execution plans; distributed architecture ensures scalability.

Multi‑Path Recall Execution Flow

Parse the standard SQL.

Optimizer automatically chooses the best index and generates an optimal plan.

Scheduler pushes each retrieval path down to the corresponding vector index for filtering.

Scores from each path are fused according to weights and sorted.

Final result set is returned.

SQL Example for Three‑Way Recall

Step 1 – Create a hybrid index on the products table with three vector columns ( emb1, emb2, emb3) and a text column context:

ALTER TABLE products ADD VECTOR INDEX `hybrid_idx`(
  `context`,
  `emb1`, `emb2`, `emb3`
) SECONDARY_ENGINE_ATTRIBUTE = '{
    "distance": "cosine",
    "scalar_fields": "category, discount, price"
}';

Step 2 – Execute a single SQL that performs:

Text‑vector top‑k + full‑text match on context.

Image‑vector top‑k with price and discount filters.

Detailed‑image vector top‑k.

Result fusion using RRF (Reciprocal Rank Fusion) or raw similarity scores.

WITH
  description_recall AS (
    SELECT id, name, 2.0 AS score, 'description' AS recall_type
    FROM products
    WHERE MATCH(context) AGAINST('户外远足装备')
    ORDER BY similarity(emb1, [0.1,0.2,0.3,0.4]) DESC
    LIMIT 50
  ),
  hot_item_recall AS (
    SELECT id, name, 1.5 AS score, 'item' AS recall_type
    FROM products
    WHERE price BETWEEN 200 AND 400 AND discount > 0.10
    ORDER BY similarity(emb2, [0.1,0.2,0.3,0.4]) DESC
    LIMIT 50
  ),
  detailed_recall AS (
    SELECT id, name, 0.8 AS score, 'detailed_item' AS recall_type
    FROM products
    ORDER BY similarity(emb3, [0.1,0.2,0.3,0.4]) DESC
    LIMIT 50
  ),
  combined AS (
    SELECT * FROM description_recall
    UNION ALL SELECT * FROM hot_item_recall
    UNION ALL SELECT * FROM detailed_recall
  )
SELECT id, name, score, recall_type
FROM combined
ORDER BY score DESC
LIMIT 100;

RRF (Reciprocal Rank Fusion) Weighting

veDB‑Search supports RRF to combine scores from multiple recall paths:

RRF_score = (1 / (k + rank_in_list1)) + (1 / (k + rank_in_list2)) + ...

Typical k is ≥ 60. This method balances the strengths of each path, improving relevance.

Filtering with Standard SQL WHERE

Complex filters can be expressed directly in SQL, and the optimizer decides whether to apply per‑filter or post‑filter strategies. Example:

WHERE MATCH(context) AGAINST('户外远足装备')
  AND (price BETWEEN 200 AND 400 OR discount > 0.10)

In‑DB Embedding Functions

Two functions enable embedding generation inside the database: to_embedding(model_id, model_key, user_input, model_type) – explicit call that returns a VECTOR column. embedding_pipeline(user_input, model_type) – defined in a hybrid index; the engine automatically generates embeddings during query execution.

Supported model_type values are content (text), image (image URL), and text (multimodal text).

Example: Text‑to‑Embedding and Storage

SELECT to_embedding('doubao-embedding-large', '$model_key', '三合一冲锋衣', 'content') AS title_vec;

Creating a table that stores the embedding:

CREATE TABLE `document_asset` (
  `id` BIGINT PRIMARY KEY AUTO_INCREMENT COMMENT '主键ID,自增唯一标识',
  `content_vec` VECTOR(4096) NOT NULL COMMENT '文本内容,用于向量化',
  ANN KEY `idx`(`content_vec`)
) ENGINE=InnoDB COMMENT='文档表';

Embedding Pipeline Example for an E‑commerce Product Table

CREATE TABLE `products` (
  `id` BIGINT PRIMARY KEY AUTO_INCREMENT COMMENT '主键ID',
  `image_url` MEDIUMTEXT NOT NULL COMMENT '图片的url地址',
  `type` INT NOT NULL DEFAULT 0 COMMENT '商品类型',
  ANN KEY `idx`(`image_url`) SECONDARY_ENGINE_ATTRIBUTE = '{
    "scalar_fields": "type",
    "distance": "cosine",
    "emb_pipeline": [{
      "col_name": "image_url",
      "model_id": "$model_id",
      "auth_key": "$model_key",
      "type": "image"
    }]
  }' ENGINE=InnoDB COMMENT='选品向量表';

Querying top‑2 similar images without storing embeddings:

SELECT id FROM products
ORDER BY l2_distance(embedding_pipeline('https://example.com/iphone15.jpg', 'image'), image_url)
LIMIT 2;

Practical E‑commerce Cases

Case 1 – Text‑to‑Image Search : Users type "三合一冲锋衣"; the system retrieves matching product images using the embedding pipeline and scalar filters (sales, price).

Case 2 – Image‑to‑Image Recommendation : Users upload a product photo; the system finds visually similar items, then re‑ranks by sales and price.

Best‑Practice Tips

Design recall paths based on core business dimensions (e.g., product image, description, tags).

Iteratively tune path weights via A/B testing to maximize relevance.

Include both embedding columns and frequently filtered scalar columns in the hybrid index for optimal query planning and elastic scaling.

veDB‑Search transforms a traditional relational database into an AI‑aware cognition engine, enabling "search everything" with a single SQL statement while dramatically reducing development and operational overhead.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Hybrid RetrievalveDB-SearchIn-DB EmbeddingMulti‑Path RecallSQL Vector Search
Volcano Engine Developer Services
Written by

Volcano Engine Developer Services

The Volcano Engine Developer Community, Volcano Engine's TOD community, connects the platform with developers, offering cutting-edge tech content and diverse events, nurturing a vibrant developer culture, and co-building an open-source ecosystem.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.