Artificial Intelligence 9 min read

Boosting LLM Evaluation with Semantic Enrichment and Vector Search

This article explains how semantic enrichment, vector retrieval, hybrid search, and clustering can be combined to evaluate large language model inputs and outputs, improve debugging, ensure compliance, and enhance user intent understanding in AI applications.

Alibaba Cloud Observability

Apr 1, 2025

Boosting LLM Evaluation with Semantic Enrichment and Vector Search

Challenges in Evaluating Large Language Models

Unlike traditional code, LLM inputs and outputs are natural language, leading to diverse and unpredictable cases. Developers need to debug model behavior, assess production quality, and audit interactions for compliance.

Semantic Analysis for Multi‑Dimensional Insight

Semantic enrichment extracts structured information from LLM logs, including:

User intent (e.g., translation, technical query)

Conversation topic (e.g., education, cloud computing)

Summarized description

Sentiment (positive, negative, neutral)

Keywords

Derived questions

Entity extraction (countries, names, etc.)

Key Capabilities

(1) Semantic Enrichment – Refines logs into structured fields such as intent, topic, sentiment, and entities.

(2) Vector Retrieval – Converts text to embeddings, builds a vector index, and supports intent‑based retrieval beyond keyword matching.

(3) Hybrid Retrieval – Combines exact keyword matching with approximate vector similarity using and conditions.

(4) Vector Clustering – Groups vectors to discover hot topics and outliers via SQL clustering_centroids and visualizes results with t_sne.

LLM Evaluation Architecture

Generic HTTP function for data processing.

Qwen model invocation via a wrapped AIGC function.

System/custom Prompt library to feed prompts/responses into the evaluation engine.

| extend "__tag__:__sls_qwen_user_tpl__" = replace(...)

Engineering Challenges of Vector Retrieval

Embedding and index construction add complexity and cost.

Recall depends on embedding model and index type.

High GPU and memory requirements increase operational expense.

SLS Vector Query Syntax

Use similarity(Key, query) < distance to express approximate matching, specify the vector field, query text, and distance threshold (0 = most similar, 1 = least).

similarity(Key,query) < distance

Hybrid Retrieval Example

uid:123 and similarity(key,query) < 0.2

Clustering Example

clustering_centroids(array(array(double)) samples, integer num_of_clusters)

Practical Use Cases

Compliance & audit: search for prohibited keywords using similarity queries.

Topic & sentiment filtering: classify conversations by extracted topics and emotions.

Content clustering: visualize related topics and identify outliers.

Conclusion

Semantic enrichment and search enable deeper understanding of LLM inputs and outputs, supporting user profiling, model iteration, and risk management across various verticals.