15 min read

Building a Multimodal Hybrid Retrieval Agent on an Integrated AI Data Layer

This article explores why many enterprise AI projects fail to deliver value, analyzes the complexity of real‑world AI use cases, and presents a step‑by‑step demo that combines vector, keyword, numeric, and spatial queries using OceanBase as a unified multimodal data store.

AI Large Model Application Practice

Sep 2, 2025

Building a Multimodal Hybrid Retrieval Agent on an Integrated AI Data Layer

Problem

Enterprise AI often requires mixed‑modal queries that combine image similarity, keyword matching, numeric filters, geographic constraints and personalized data. Traditional pipelines that chain a knowledge base, vector store and LLM cannot handle such complex requests efficiently.

Insight

The bottleneck is the fragmented data landscape: products have images, text, specs and videos; transactions have amounts, timestamps and locations; users have profiles and behavior logs. A unified storage that can handle structured fields, JSON, vectors and spatial types is needed.

Solution Architecture

Use OceanBase (OB Cloud) as a multimodal AI data layer. OceanBase supports SQL access to structured columns, JSON, vector columns and spatial types, eliminating separate RDBMS, vector DB and search engine components.

Demo Scenario

A retail furniture chatbot must answer queries such as “show me a fabric sofa similar to this picture, priced under 10,000, suitable for a girl, near my location”. The demo stack includes:

Workflow engine: LangGraph for flexible orchestration.

AI data layer: OceanBase with vector support.

LLM: Alibaba Tongyi Qianwen.

Embedding model: Alibaba text‑embedding‑v3/multimodal‑embedding‑v1.

The system extracts intent, parses filter conditions, validates them, and generates a hybrid SQL statement that mixes scalar filters, JSON extraction, spatial predicates and cosine‑distance vector similarity.

Hybrid SQL Example

SELECT cosine_distance(p.image_vector, [ref_image_vec]) AS similar_score_image,
       cosine_distance(p.description_vector, [ref_text_vec]) AS similar_score_text,
       ...
FROM products p
WHERE style='布艺' AND price<=10000
  AND CAST(JSON_UNQUOTE(JSON_EXTRACT(promotion_policy,'$.discount')) AS DECIMAL(3,1))<9.0
  AND st_dwithin(address, ST_GeomFromText(...), 20)
ORDER BY similar_score_image ASC
LIMIT 3;

This query demonstrates a single statement that filters by style, price, discount, proximity and ranks results by image and text similarity.

LLM‑Driven Query Generation

Two approaches are possible: (1) few‑shot Text‑to‑SQL prompting, or (2) LLM conditional reasoning plus code assembly. The demo uses the latter: the LLM extracts structured conditions from natural language, validates them, and assembles the final SQL.

Python Vector Search Snippet

def _vector_search(self, embedding: List[float], filters: Dict[str, Any] = None, search_type: str = 'text') -> List[Dict]:
    vector_column = 'description_vector' if search_type == 'text' else 'image_vector'
    base_query = f"""
        SELECT id, name, description, material, style, price, ...,
               cosine_distance({vector_column}, '{embedding}') as similarity
        FROM {self.table_name}
    """
    conditions = []
    if filters:
        if filters.get(self.material_name):
            conditions.append(f"material LIKE '%{filters[self.material_name]}%'")
        # add other filters as needed
    if conditions:
        base_query += " WHERE " + " AND ".join(conditions)
    base_query += f" ORDER BY similarity ASC LIMIT {self.topk}"
    results = self.client.perform_raw_text_sql(base_query)
    return results

Workflow Design

A custom LangGraph workflow replaces a generic ReAct agent. Prompts are crafted for intent recognition and filter extraction, enabling the LLM to output a JSON‑like condition set that drives the SQL generator.

Results and Extensions

The prototype returns product details via a Streamlit UI, supports spatial queries (POINT, POLYGON) and semi‑structured JSON data, and can be extended to use the same data layer for classic RAG document retrieval.

Benefits of a Unified AI Data Layer

Reduces component count and glue code.

Enables single‑statement hybrid retrieval, improving latency and relevance.

Provides transactional consistency across modalities.

Leverages enterprise‑grade features of OceanBase (high availability, scalability).

Source code:

https://github.com/pingcy/oceanbase-multimodal-demo

LLM multimodal retrieval Enterprise AI OceanBase LangGraph Hybrid Search

Written by

AI Large Model Application Practice

Focused on deep research and development of large-model applications. Authors of "RAG Application Development and Optimization Based on Large Models" and "MCP Principles Unveiled and Development Guide". Primarily B2B, with B2C as a supplement.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.