Build a Multimodal AI Data Analysis and Retrieval System with Hologres 4.0
This guide walks through constructing an AI‑powered multimodal data analysis platform on Hologres 4.0, covering PDF ingestion, vector and full‑text indexing, AI functions, stored procedures, and hybrid retrieval queries to enable enterprise‑level search and insight extraction.
Overview
Hologres 4.0 provides an end‑to‑end multimodal AI data platform that can ingest unstructured files (PDF, image, PPT) from OSS, convert them into embeddings, and support both vector and full‑text retrieval using standard SQL. The platform introduces the HGraph vector index and enables incremental processing via Dynamic Tables.
Core Capabilities
Object Table reads unstructured files stored in OSS.
AI Functions ( ai_embed, ai_chunk) convert files to embeddings and chunks directly in SQL.
Dynamic Table with incremental refresh for efficient incremental processing.
Vector search and full‑text search can be combined in a single query.
Solution Advantages
Pure‑SQL pipeline (embed → chunk → refresh → retrieval) eliminates external code.
Standard SQL interface lowers the learning curve for developers.
All data stays inside the warehouse, leveraging Hologres access controls for security.
Workflow
Upload a PDF dataset (e.g., 80 prospectus PDFs from ModelScope) to an OSS bucket.
Configure a RAM role with sts:AssumeRole and grant it AliyunOSSReadOnlyAccess and Hologres access.
Execute the stored procedure create_rag_corpus_from_oss to create an Object Table, parse, chunk, embed the PDFs, and build vector and full‑text indexes.
Manually refresh the tables with refresh_rag_corpus_table.
Run vector, full‑text, or hybrid retrieval functions to answer business questions.
Environment Setup
Provision a Hologres V4.0+ instance and create a database.
Allocate required AI resources (GPU/CPU) for the models.
Configure the RAM role with sts:AssumeRole and service hologres.aliyuncs.com.
Model Deployment
to_doc (category: ds4sd/docling-models ) – PDF‑to‑document conversion. CPU 20 cores, 100 GB RAM, 1 × 48 GB GPU, 1 replica.
chunk (category: recursive-character-text-splitter ) – Text chunking for large PDFs. CPU 15 cores, 30 GB RAM, 0 GPU, 1 replica.
pdf_embed (category: BAAI/bge-base-zh-v1.5 ) – Document embedding. CPU 7 cores, 30 GB RAM, 1 × 48 GB GPU, 1 replica.
llm (category: Qwen/Qwen3-32B ) – Large‑model inference for answer generation. CPU 7 cores, 30 GB RAM, 1 × 48 GB GPU, 1 replica.
Stored Procedures
The main procedure create_rag_corpus_from_oss performs the following steps:
CALL create_rag_corpus_from_oss(
oss_path => 'oss://xxxx/bs_challenge_financial_14b_dataset/pdf',
oss_endpoint => 'oss-cn-hangzhou-internal.aliyuncs.com',
oss_role_arn => 'acs:ram::186xxxx:role/xxxx',
corpus_table => 'public.dt_bs_challenge_financial'
);It creates an Object Table to read PDF metadata, refreshes it, selects parsing, chunking, and embedding models, builds a Dynamic Table with vector and full‑text indexes, and creates the corresponding indexes.
Refresh the tables after creation:
CALL refresh_rag_corpus_table('public.dt_bs_challenge_financial');Retrieval Functions
Vector‑only Retrieval
SELECT qa_vector_search_retrieval(
question => '报告期内,湖南国科微电子股份有限公司2014年度、2015年度、2016年度营业收入和净利润分别较上年增长多大幅度?',
corpus_table => 'dt_bs_challenge_financial',
prompt => '请分析如下业绩走势是悲观还是乐观,并给出原因:${question}
参考信息:
${context}'
);Full‑text Retrieval
SELECT qa_text_search_retrieval(
question => '报告期内,湖南国科微电子股份有限公司2014年度、2015年度、2016年度营业收入和净利润分别较上年增长多大幅度?',
corpus_table => 'dt_bs_challenge_financial',
prompt => '请分析如下业绩走势是悲观还是乐观,并给出原因:${question}
参考信息:
${context}'
);Hybrid Vector + Full‑text Retrieval
SELECT qa_hybrid_retrieval(
question => '报告期内,湖南国科微电子股份有限公司2014年度、2015年度、2016年度营业收入和净利润分别较上年增长多大幅度?',
corpus_table => 'dt_bs_challenge_financial',
prompt => '请分析如下业绩走势是悲观还是乐观,并给出原因:${question}
参考信息:
${context}'
);Hybrid Retrieval with Reciprocal Rank Fusion (RRF)
SELECT qa_hybrid_retrieval_rrf(
question => '报告期内,湖南国科微电子股份有限公司2014年度、2015年度、2016年度营业收入和净利润分别较上年增长多大幅度?',
corpus_table => 'dt_bs_challenge_financial',
prompt => '请分析如下业绩走势是悲观还是乐观,并给出原因:${question}
参考信息:
${context}'
);Result Example
Each function returns a concise AI‑generated answer that analyses the company’s revenue and profit trends and judges the outlook as optimistic or pessimistic, together with supporting reasons.
Appendix: Procedure Definitions
PDF Processing Procedure
CREATE OR REPLACE PROCEDURE create_rag_corpus_from_oss(
oss_path TEXT,
oss_endpoint TEXT,
oss_role_arn TEXT,
corpus_table TEXT,
embedding_model TEXT DEFAULT NULL,
parse_document_model TEXT DEFAULT NULL,
chunk_model TEXT DEFAULT NULL,
chunk_size INT DEFAULT 300,
chunk_overlap INT DEFAULT 50,
overwrite BOOLEAN DEFAULT FALSE
) AS $$
DECLARE
corpus_schema TEXT;
corpus_name TEXT;
obj_table_name TEXT;
full_corpus_ident TEXT;
full_obj_ident TEXT;
embed_expr TEXT;
chunk_expr TEXT;
parse_expr TEXT;
embedding_dims INT;
BEGIN
-- Parse schema and table name
IF position('.' IN corpus_table) > 0 THEN
corpus_schema := split_part(corpus_table, '.', 1);
corpus_name := split_part(corpus_table, '.', 2);
ELSE
corpus_schema := 'public';
corpus_name := corpus_table;
END IF;
obj_table_name := corpus_name || '_obj_table';
full_corpus_ident := format('%I.%I', corpus_schema, corpus_name);
full_obj_ident := format('%I.%I', corpus_schema, obj_table_name);
-- Overwrite handling (drop existing tables if needed)
IF overwrite THEN
-- Cancel running refresh jobs, drop Dynamic Table and Object Table
EXECUTE format('DROP TABLE IF EXISTS %s;', full_corpus_ident);
EXECUTE format('DROP OBJECT TABLE IF EXISTS %s;', full_obj_ident);
END IF;
-- Create Object Table
EXECUTE format(
'CREATE OBJECT TABLE %s WITH (path => %L, oss_endpoint => %L, role_arn => %L);',
full_obj_ident, oss_path, oss_endpoint, oss_role_arn);
COMMIT;
-- Refresh Object Table to load metadata
EXECUTE format('REFRESH OBJECT TABLE %s;', full_obj_ident);
COMMIT;
-- Choose parsing expression
IF parse_document_model IS NULL OR length(trim(parse_document_model)) = 0 THEN
parse_expr := 'ai_parse_document(file, ''auto'', ''markdown'')';
ELSE
parse_expr := format('ai_parse_document(%L, file, ''auto'', ''markdown'')', parse_document_model);
END IF;
-- Choose chunk expression
IF chunk_model IS NULL OR length(trim(chunk_model)) = 0 THEN
chunk_expr := format('ai_chunk(doc, %s, %s)', chunk_size, chunk_overlap);
ELSE
chunk_expr := format('ai_chunk(%L, doc, %s, %s)', chunk_model, chunk_size, chunk_overlap);
END IF;
-- Choose embedding expression
IF embedding_model IS NULL OR length(trim(embedding_model)) = 0 THEN
embed_expr := 'ai_embed(chunk)';
SELECT array_length(ai_embed('dummy'), 1) INTO embedding_dims;
ELSE
embed_expr := format('ai_embed(%L, chunk)', embedding_model);
SELECT array_length(ai_embed(%L, 'dummy'), 1) INTO embedding_dims FROM (SELECT 1) t;
END IF;
-- Create Dynamic Table with vector index
EXECUTE format(
'CREATE DYNAMIC TABLE %s (
CHECK(array_ndims(embedding_vector) = 1 AND array_length(embedding_vector,1) = %s)
) WITH (
vectors = ''{\"embedding_vector\":{\"algorithm\":\"HGraph\",\"distance_method\":\"Cosine\",\"builder_params\":{\"base_quantization_type\":\"sq8_uniform\",\"max_degree\":64,\"ef_construction\":400,\"precise_quantization_type\":\"fp32\",\"use_reorder\":true}}'',
auto_refresh_mode = ''incremental'',
freshness = ''5 minutes'',
auto_refresh_enable = ''false''
) AS
WITH parsed_doc AS (
SELECT object_uri, etag, %s AS doc FROM %s
),
chunked_doc AS (
SELECT object_uri, etag, unnest(%s) AS chunk FROM parsed_doc
)
SELECT object_uri, etag, chunk, %s AS embedding_vector FROM chunked_doc;',
full_corpus_ident, embedding_dims, parse_expr, full_obj_ident, chunk_expr, embed_expr);
COMMIT;
-- Create full‑text index on chunk column
EXECUTE format('CREATE INDEX %I ON %s USING FULLTEXT (chunk);',
corpus_name || '_fulltext_idx', full_corpus_ident);
RAISE NOTICE 'Create RAG corpus success to table: %', corpus_table;
END;
$$ LANGUAGE plpgsql;Refresh Procedure
CREATE OR REPLACE PROCEDURE refresh_rag_corpus_table(
corpus_table TEXT
) AS $$
DECLARE
corpus_schema TEXT;
corpus_name TEXT;
obj_table_name TEXT;
full_corpus_ident TEXT;
full_obj_ident TEXT;
BEGIN
IF position('.' IN corpus_table) > 0 THEN
corpus_schema := split_part(corpus_table, '.', 1);
corpus_name := split_part(corpus_table, '.', 2);
ELSE
corpus_schema := 'public';
corpus_name := corpus_table;
END IF;
obj_table_name := corpus_name || '_obj_table';
full_corpus_ident := format('%I.%I', corpus_schema, corpus_name);
full_obj_ident := format('%I.%I', corpus_schema, obj_table_name);
EXECUTE format('REFRESH OBJECT TABLE %s;', full_obj_ident);
EXECUTE format('REFRESH TABLE %s;', full_corpus_ident);
RAISE NOTICE 'Refresh complete for corpus table %', corpus_table;
END;
$$ LANGUAGE plpgsql;Drop Procedure
CREATE OR REPLACE PROCEDURE drop_rag_corpus_table(
corpus_table TEXT
) AS $$
DECLARE
corpus_schema TEXT;
corpus_name TEXT;
obj_table_name TEXT;
full_corpus_ident TEXT;
full_obj_ident TEXT;
rec RECORD;
BEGIN
IF position('.' IN corpus_table) > 0 THEN
corpus_schema := split_part(corpus_table, '.', 1);
corpus_name := split_part(corpus_table, '.', 2);
ELSE
corpus_schema := 'public';
corpus_name := corpus_table;
END IF;
obj_table_name := corpus_name || '_obj_table';
full_corpus_ident := format('%I.%I', corpus_schema, corpus_name);
full_obj_ident := format('%I.%I', corpus_schema, obj_table_name);
-- Cancel running refresh jobs if any
FOR rec IN EXECUTE format(
'SELECT query_job_id FROM hologres.hg_dynamic_table_refresh_log(%L) WHERE status = ''RUNNING''',
corpus_table)
LOOP
IF hologres.hg_internal_cancel_query_job(rec.query_job_id::bigint) THEN
RAISE NOTICE 'Cancel job % succeeded.', rec.query_job_id;
ELSE
RAISE WARNING 'Cancel job % failed.', rec.query_job_id;
END IF;
END LOOP;
EXECUTE format('DROP TABLE IF EXISTS %s;', full_corpus_ident);
EXECUTE format('DROP OBJECT TABLE IF EXISTS %s;', full_obj_ident);
RAISE NOTICE 'Drop complete for corpus: %', corpus_table;
END;
$$ LANGUAGE plpgsql;Vector Retrieval Function
CREATE OR REPLACE FUNCTION qa_vector_search_retrieval(
question TEXT,
corpus_table TEXT,
embedding_model TEXT DEFAULT NULL,
llm_model TEXT DEFAULT NULL,
ranking_model TEXT DEFAULT NULL,
prompt TEXT DEFAULT 'Please answer the following question in ${language} based on the reference information.
Question: ${question}
Reference information:
${context}',
language TEXT DEFAULT 'Chinese',
vector_recall_count INT DEFAULT 20,
rerank_recall_count INT DEFAULT 5,
vector_col TEXT DEFAULT 'embedding_vector'
) RETURNS TEXT AS $$
DECLARE
final_answer TEXT;
sql TEXT;
embedding_expr TEXT;
ai_rank_expr TEXT;
ai_gen_expr TEXT;
embedding_model_valid BOOLEAN;
llm_model_valid BOOLEAN;
ranking_model_valid BOOLEAN;
BEGIN
embedding_model_valid := embedding_model IS NOT NULL AND trim(embedding_model) <> '';
llm_model_valid := llm_model IS NOT NULL AND trim(llm_model) <> '';
ranking_model_valid := ranking_model IS NOT NULL AND trim(ranking_model) <> '';
IF embedding_model_valid THEN
embedding_expr := 'ai_embed(' || quote_literal(embedding_model) || ', ' || quote_literal(question) || ')';
ELSE
embedding_expr := 'ai_embed(' || quote_literal(question) || ')';
END IF;
IF ranking_model_valid THEN
ai_rank_expr := 'ai_rank(' || quote_literal(ranking_model) || ', ' || quote_literal(question) || ', chunk)';
ELSE
ai_rank_expr := 'ai_rank(' || quote_literal(question) || ', chunk)';
END IF;
IF llm_model_valid THEN
ai_gen_expr := 'ai_gen(' || quote_literal(llm_model) || ', replace(replace(replace(' || quote_literal(prompt) ||
', ''${question}'', ' || quote_literal(question) || '), ''${context}'', merged_chunks), ''${language}'', ' || quote_literal(language) || '))';
ELSE
ai_gen_expr := 'ai_gen(replace(replace(replace(' || quote_literal(prompt) ||
', ''${question}'', ' || quote_literal(question) || '), ''${context}'', merged_chunks), ''${language}'', ' || quote_literal(language) || '))';
END IF;
sql := 'WITH embedding_recall AS (
SELECT chunk, approx_cosine_distance(' || vector_col || ', ' || embedding_expr || ') AS distance
FROM ' || corpus_table || '
ORDER BY distance DESC
LIMIT ' || vector_recall_count || '
),
rerank AS (
SELECT chunk, ' || ai_rank_expr || ' AS score
FROM embedding_recall
ORDER BY score DESC
LIMIT ' || rerank_recall_count || '
),
concat_top_chunks AS (
SELECT string_agg(chunk, E''
----
'') AS merged_chunks FROM rerank
)
SELECT ' || ai_gen_expr || ' FROM concat_top_chunks;';
EXECUTE sql INTO final_answer;
RETURN final_answer;
END;
$$ LANGUAGE plpgsql;Full‑text Retrieval Function
CREATE OR REPLACE FUNCTION qa_text_search_retrieval(
question TEXT,
corpus_table TEXT,
llm_model TEXT DEFAULT NULL,
ranking_model TEXT DEFAULT NULL,
prompt TEXT DEFAULT 'Please answer the following question in ${language} based on the reference information.
Question: ${question}
Reference information:
${context}',
language TEXT DEFAULT 'Chinese',
text_search_recall_count INT DEFAULT 20,
rerank_recall_count INT DEFAULT 5,
text_search_col TEXT DEFAULT 'chunk'
) RETURNS TEXT AS $$
DECLARE
final_answer TEXT;
sql TEXT;
ai_rank_expr TEXT;
ai_gen_expr TEXT;
llm_model_valid BOOLEAN;
ranking_model_valid BOOLEAN;
BEGIN
llm_model_valid := llm_model IS NOT NULL AND trim(llm_model) <> '';
ranking_model_valid := ranking_model IS NOT NULL AND trim(ranking_model) <> '';
IF ranking_model_valid THEN
ai_rank_expr := 'ai_rank(' || quote_literal(ranking_model) || ', ' || quote_literal(question) || ', chunk)';
ELSE
ai_rank_expr := 'ai_rank(' || quote_literal(question) || ', chunk)';
END IF;
IF llm_model_valid THEN
ai_gen_expr := 'ai_gen(' || quote_literal(llm_model) || ', replace(replace(replace(' || quote_literal(prompt) ||
', ''${question}'', ' || quote_literal(question) || '), ''${context}'', merged_chunks), ''${language}'', ' || quote_literal(language) || '))';
ELSE
ai_gen_expr := 'ai_gen(replace(replace(replace(' || quote_literal(prompt) ||
', ''${question}'', ' || quote_literal(question) || '), ''${context}'', merged_chunks), ''${language}'', ' || quote_literal(language) || '))';
END IF;
sql := 'WITH text_search_recall AS (
SELECT chunk
FROM ' || corpus_table || '
ORDER BY text_search(' || text_search_col || ', ' || quote_literal(question) || ') DESC
LIMIT ' || text_search_recall_count || '
),
rerank AS (
SELECT chunk, ' || ai_rank_expr || ' AS score
FROM text_search_recall
ORDER BY score DESC
LIMIT ' || rerank_recall_count || '
),
concat_top_chunks AS (
SELECT string_agg(chunk, E''
----
'') AS merged_chunks FROM rerank
)
SELECT ' || ai_gen_expr || ' FROM concat_top_chunks;';
EXECUTE sql INTO final_answer;
RETURN final_answer;
END;
$$ LANGUAGE plpgsql;Hybrid Retrieval Function (Vector + Full‑text + Rank)
CREATE OR REPLACE FUNCTION qa_hybrid_retrieval(
question TEXT,
corpus_table TEXT,
embedding_model TEXT DEFAULT NULL,
llm_model TEXT DEFAULT NULL,
ranking_model TEXT DEFAULT NULL,
prompt TEXT DEFAULT 'Please answer the following question in ${language} based on the reference information.
Question: ${question}
Reference information:
${context}',
language TEXT DEFAULT 'Chinese',
text_search_recall_count INT DEFAULT 20,
vector_recall_count INT DEFAULT 20,
rerank_recall_count INT DEFAULT 5,
vector_col TEXT DEFAULT 'embedding_vector',
text_search_col TEXT DEFAULT 'chunk'
) RETURNS TEXT AS $$
DECLARE
final_answer TEXT;
sql TEXT;
embedding_expr TEXT;
ai_rank_expr TEXT;
ai_gen_expr TEXT;
embedding_model_valid BOOLEAN;
llm_model_valid BOOLEAN;
ranking_model_valid BOOLEAN;
BEGIN
embedding_model_valid := embedding_model IS NOT NULL AND trim(embedding_model) <> '';
llm_model_valid := llm_model IS NOT NULL AND trim(llm_model) <> '';
ranking_model_valid := ranking_model IS NOT NULL AND trim(ranking_model) <> '';
IF embedding_model_valid THEN
embedding_expr := 'ai_embed(' || quote_literal(embedding_model) || ', ' || quote_literal(question) || ')';
ELSE
embedding_expr := 'ai_embed(' || quote_literal(question) || ')';
END IF;
IF ranking_model_valid THEN
ai_rank_expr := 'ai_rank(' || quote_literal(ranking_model) || ', ' || quote_literal(question) || ', chunk)';
ELSE
ai_rank_expr := 'ai_rank(' || quote_literal(question) || ', chunk)';
END IF;
IF llm_model_valid THEN
ai_gen_expr := 'ai_gen(' || quote_literal(llm_model) || ', replace(replace(replace(' || quote_literal(prompt) ||
', ''${question}'', ' || quote_literal(question) || '), ''${context}'', merged_chunks), ''${language}'', ' || quote_literal(language) || '))';
ELSE
ai_gen_expr := 'ai_gen(replace(replace(replace(' || quote_literal(prompt) ||
', ''${question}'', ' || quote_literal(question) || '), ''${context}'', merged_chunks), ''${language}'', ' || quote_literal(language) || '))';
END IF;
sql := 'WITH embedding_recall AS (
SELECT chunk
FROM ' || corpus_table || '
ORDER BY approx_cosine_distance(' || vector_col || ', ' || embedding_expr || ') DESC
LIMIT ' || vector_recall_count || '
),
text_search_recall AS (
SELECT chunk
FROM ' || corpus_table || '
ORDER BY text_search(' || text_search_col || ', ' || quote_literal(question) || ') DESC
LIMIT ' || text_search_recall_count || '
),
union_recall AS (
SELECT chunk FROM embedding_recall
UNION
SELECT chunk FROM text_search_recall
),
rerank AS (
SELECT chunk, ' || ai_rank_expr || ' AS score
FROM union_recall
ORDER BY score DESC
LIMIT ' || rerank_recall_count || '
),
concat_top_chunks AS (
SELECT string_agg(chunk, E''
----
'') AS merged_chunks FROM rerank
)
SELECT ' || ai_gen_expr || ' FROM concat_top_chunks;';
EXECUTE sql INTO final_answer;
RETURN final_answer;
END;
$$ LANGUAGE plpgsql;Hybrid Retrieval with RRF Function
CREATE OR REPLACE FUNCTION qa_hybrid_retrieval_rrf(
question TEXT,
corpus_table TEXT,
embedding_model TEXT DEFAULT NULL,
llm_model TEXT DEFAULT NULL,
ranking_model TEXT DEFAULT NULL,
prompt TEXT DEFAULT 'Please answer the following question in ${language} based on the reference information.
Question: ${question}
Reference information:
${context}',
language TEXT DEFAULT 'Chinese',
text_search_recall_count INT DEFAULT 20,
vector_recall_count INT DEFAULT 20,
rerank_recall_count INT DEFAULT 5,
rrf_k INT DEFAULT 60,
vector_col TEXT DEFAULT 'embedding_vector',
text_search_col TEXT DEFAULT 'chunk'
) RETURNS TEXT AS $$
DECLARE
final_answer TEXT;
sql TEXT;
embedding_expr TEXT;
ai_rank_expr TEXT;
ai_gen_expr TEXT;
embedding_model_valid BOOLEAN;
llm_model_valid BOOLEAN;
ranking_model_valid BOOLEAN;
BEGIN
embedding_model_valid := embedding_model IS NOT NULL AND trim(embedding_model) <> '';
llm_model_valid := llm_model IS NOT NULL AND trim(llm_model) <> '';
ranking_model_valid := ranking_model IS NOT NULL AND trim(ranking_model) <> '';
IF embedding_model_valid THEN
embedding_expr := 'ai_embed(' || quote_literal(embedding_model) || ', ' || quote_literal(question) || ')';
ELSE
embedding_expr := 'ai_embed(' || quote_literal(question) || ')';
END IF;
IF ranking_model_valid THEN
ai_rank_expr := 'ai_rank(' || quote_literal(ranking_model) || ', ' || quote_literal(question) || ', chunk)';
ELSE
ai_rank_expr := 'ai_rank(' || quote_literal(question) || ', chunk)';
END IF;
IF llm_model_valid THEN
ai_gen_expr := 'ai_gen(' || quote_literal(llm_model) || ', replace(replace(replace(' || quote_literal(prompt) ||
', ''${question}'', ' || quote_literal(question) || '), ''${context}'', merged_chunks), ''${language}'', ' || quote_literal(language) || '))';
ELSE
ai_gen_expr := 'ai_gen(replace(replace(replace(' || quote_literal(prompt) ||
', ''${question}'', ' || quote_literal(question) || '), ''${context}'', merged_chunks), ''${language}'', ' || quote_literal(language) || '))';
END IF;
sql := 'WITH embedding_recall AS (
SELECT chunk, approx_cosine_distance(' || vector_col || ', ' || embedding_expr || ') AS vec_score,
ROW_NUMBER() OVER (ORDER BY approx_cosine_distance(' || vector_col || ', ' || embedding_expr || ') DESC) AS rank_vec
FROM ' || corpus_table || '
ORDER BY vec_score DESC
LIMIT ' || vector_recall_count || '
),
text_search_recall AS (
SELECT chunk, text_search(' || text_search_col || ', ' || quote_literal(question) || ') AS text_score,
ROW_NUMBER() OVER (ORDER BY text_search(' || text_search_col || ', ' || quote_literal(question) || ') DESC) AS rank_text
FROM ' || corpus_table || '
WHERE text_search(' || text_search_col || ', ' || quote_literal(question) || ') > 0
LIMIT ' || text_search_recall_count || '
),
rrf_scores AS (
SELECT chunk, SUM(1.0 / (' || rrf_k || ' + rank_val)) AS rrf_score
FROM (
SELECT chunk, rank_vec AS rank_val FROM embedding_recall
UNION ALL
SELECT chunk, rank_text AS rank_val FROM text_search_recall
) sub
GROUP BY chunk
),
top_chunks AS (
SELECT chunk FROM rrf_scores ORDER BY rrf_score DESC LIMIT ' || rerank_recall_count || '
),
concat_top_chunks AS (
SELECT string_agg(chunk, E''
----
'') AS merged_chunks FROM top_chunks
)
SELECT ' || ai_gen_expr || ' FROM concat_top_chunks;';
EXECUTE sql INTO final_answer;
RETURN final_answer;
END;
$$ LANGUAGE plpgsql;Alibaba Cloud Big Data AI Platform
The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
