Boost LLM Evaluation with Semantic Enrichment and Vector Search
This article explains how semantic enrichment, vector and hybrid search, and clustering techniques can be applied to large language model logs to evaluate inputs and outputs, improve compliance auditing, and enhance model iteration across various business scenarios.
Challenges of Large Model Content Evaluation
Unlike traditional code, where fixed inputs produce deterministic outputs that can be fully tested, large language models (LLMs) accept natural‑language inputs that vary wildly. Developers need to debug model I/O, assess production quality, and audit all interactions for compliance.
Semantic Analysis: Multi‑Angle Understanding of LLM I/O
Effective log management for LLMs requires natural‑language search, processing, and analysis capabilities, including:
Semantic enrichment: extracting structured information such as user intent, topic, sentiment, keywords, questions, and entities.
Vector retrieval: one‑stop embedding and vector‑index support, enabling intent‑based search beyond keyword matching.
Hybrid retrieval: combining exact keyword matches with approximate vector matches across multiple fields.
Clustering: grouping natural‑language records to identify hotspots and outliers.
(1) Semantic Enrichment
In Retrieval‑Augmented Generation (RAG) scenarios, documents are converted to structured Markdown, chunked, and indexed as vectors. Traditional pipelines lose information, so multi‑modal feature extraction builds a multidimensional semantic space covering:
User intent (e.g., translation, technical query, legal advice).
Topic (e.g., education, cloud computing, law).
Summary (concise description of the conversation).
Sentiment (positive, negative, neutral).
Keywords.
Questions derived from the conversation.
Entity extraction (countries, names, locations).
LLM evaluation and vector indexing extract these structures from prompts and responses, visualize results, and support compliance audits to mitigate legal risks.
Alibaba Cloud Log Service (SLS) provides semantic processing APIs that can call hosted Qwen models or custom LLM endpoints for enrichment.
LLM Evaluation Architecture
Generic HTTP function: SPL syntax for HTTP calls with URL, body, headers.
Qwen model invocation: wrapper AIGC function passing endpoint, access‑key, system and user prompts.
System/custom Prompt library: built‑in Evaluation System Prompt templates or user‑defined prompts.
| extend "__tag__:__sls_qwen_user_tpl__" = replace(replace(replace(replace(replace(replace(replace(replace("__tag__:__sls_qwen_user_tpl__", '<INPUT_TEMPLATE>', "output.value"), '\\', '\\'), '"', '\"'), chr(8), '\b'), chr(12), '\f'), chr(10), '
'), chr(13), '\r'), chr(9), '\t')
| extend "__tag__:__sls_qwen_sys_tpl__" = replace(replace(replace(replace(replace(replace(replace("__tag__:__sls_qwen_sys_tpl__", '\\', '\\'), '"', '\"'), chr(8), '\b'), chr(12), '\f'), chr(10), '
'), chr(13), '\r'), chr(9), '\t')
| extend request_body = replace(replace("__tag__:__sls_qwen_body_tpl__", '<SYSTEM_PROMPT>', "__tag__:__sls_qwen_sys_tpl__"), '<USER_PROMPT>', "__tag__:__sls_qwen_user_tpl__")
| http-call -method='post' -headers='{"Authorization": "Bearer xxxxxx", "Content-Type": "application/json", "Host": "dashscope.aliyuncs.com", "User-Agent":"sls-etl-test"}' -timeout_millis=60000 -body='request_body' 'http://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation' as status, response_body
| extend tmp_content = json_extract_scalar(response_body, '$.output.choices.0.message.content')
| extend output_enrich = regexp_replace(regexp_replace(tmp_content, '^([^{]|\s)+{', '{'), '}([^}]|\s)+$', '}')
| project-away "__tag__:__sls_qwen_sys_tpl__", "__tag__:__sls_qwen_user_tpl__", "__tag__:__sls_qwen_body_tpl__", trimed_input, tmp_content, request_body, response_body(2) Vector Retrieval
Implementing vector retrieval requires embedding text, building indexes, and maintaining pipelines, which adds engineering complexity and cost. GPU resources are needed for embedding and indexing, and high‑dimensional vectors consume significant memory.
SLS offers a one‑stop vector retrieval service: after writing prompts/responses to SLS, it automatically embeds, indexes, and queries vectors. Users only need to provide text and a query.
Key syntax points:
Use similarity to express approximate similarity.
Specify the vector index key.
Provide the query string.
Set a distance threshold (0 = most similar, 1 = least similar).
similarity(Key,query) < distance(3) Hybrid Retrieval
When both exact field matches and approximate text similarity are required, hybrid retrieval combines keyword inverted indexes with vector indexes using and conditions.
uid:123 and similarity(key,query) < distance(4) Vector Clustering
Clustering transforms high‑dimensional vectors into groups to reveal hot topics and outliers. The SQL function clustering_centroids(samples, num_of_clusters) computes centroids, while t_sne(samples) reduces dimensions for visualization.
clustering_centroids(array(array(double)) samples, integer num_of_clusters) t_sne(array(array(double))Engineering Practice of LLM Prompt/Response Semantic Insight
After extracting semantic information, the following business goals are achieved:
Compliance and Auditing : Search for prohibited keywords using similarity with adjustable distance thresholds.
similarity("input_semantic.summary", "恶意关键词") < 0.4Topic and Sentiment Filtering : Query by extracted topic or sentiment, e.g., input_semantic.topic : database.
Content Clustering : Visualize clustered conversations to see topic relationships; each color represents a distinct cluster.
Conclusion
Semantic enrichment and search enable deeper understanding of LLM inputs and outputs, facilitating more intelligent applications. The same techniques can be extended to vertical scenarios such as user‑portrait construction, model iteration optimization, and compliance risk management.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
