Artificial Intelligence 19 min read

RedParrot’s Semantic Cache Accelerates Enterprise NL‑to‑DSL Analytics by 3.6×

RedParrot introduces a query‑semantic‑caching framework that compresses the multi‑stage LLM NL‑to‑DSL workflow into a short‑chain process, achieving an average 3.6× inference speedup and an 8.26% accuracy gain on real‑world business data while also delivering strong generalization on open NL‑to‑DSL benchmarks.

Xiaohongshu Tech REDtech

Jun 17, 2026

RedParrot’s Semantic Cache Accelerates Enterprise NL‑to‑DSL Analytics by 3.6×

Background and Challenge

Business analysts increasingly require interactive natural‑language data analysis (e.g., “iPhone 17 shipments in the last 7 days”). Enterprise systems convert natural language to a controlled domain‑specific language (DSL) to enforce semantic consistency, permission control, and execution stability. The original NL‑to‑DSL pipeline at Xiaohongshu consisted of five stages—intent parsing, data retrieval, dimension generation, metric generation, and filter generation—resulting in P90 latency > 30 s, token consumption > 26 k per request, and only 35 % execution accuracy due to error propagation.

Analysis of real queries showed that many queries share the same underlying DSL skeleton despite differing entities or time windows, suggesting that historical query structures can be reused.

Long‑Chain Bottleneck

Measurements on an internal dataset showed that the five stages together incurred P90 latency of 30.25 s, consumed 26 271 tokens, and achieved 35.00 % execution accuracy, indicating that the length of the chain, not model capability, is the primary limitation.

Query Skeleton Cache Construction

Offline, historical queries and their DSLs are processed to generate skeletons by removing entities and timestamps using LLM‑generated representations, NER, and rule filtering. Skeleton vectors are clustered with K‑means; within each cluster a similarity‑based graph partitions skeletons into fine‑grained connected components. Representative, highly connected skeletons are selected for the cache, keeping cache size bounded while preserving structural diversity.

Entity‑Agnostic Embedding Model

Online inference avoids a separate skeleton‑extraction step by using a self‑supervised contrastive encoder. Positive pairs are queries from the same graph component; hard negatives are queries from the same K‑means cluster but different components; ordinary negatives are from other clusters. Compared with a Qwen3‑embedding‑0.6B baseline, the encoder improves HR@5 by 4.23 percentage points and FHR@5 by 12.47 points across three enterprise datasets.

Multi‑Source Heterogeneous Knowledge Retrieval

Three knowledge sources complement the cache:

DSL configuration knowledge (syntax, parameter types, operator constraints).

Column‑value knowledge that maps natural‑language values to standardized enumeration values.

Enterprise domain knowledge (business‑specific terms, abbreviations, metric definitions such as DGMV, GMV, SOV).

Compact DSL configuration is injected directly into prompts. Larger column‑value and domain knowledge are retrieved via a hybrid BM25 + dense retrieval pipeline and fused with reciprocal‑rank fusion.

DSL Rewrite and Dual‑Path Execution

During online serving, the entity‑agnostic encoder retrieves the most similar skeleton and its historical DSL template. A structured prompt enriched with the retrieved knowledge guides the LLM to rewrite the DSL for the current query. When cache confidence exceeds a threshold, the short‑chain path is used; otherwise the system falls back to the full long‑chain to guarantee correctness, and successful long‑chain results are fed back to update the cache.

Experimental Setup

Evaluations were performed on three internal business domains (RED‑commerce, RED‑community, RED‑trading) with two versioned datasets (‑095, ‑0916), and on transformed open benchmarks Spider‑DSL and BIRD‑DSL (Simple, Moderate, Challenging). Metrics included execution accuracy (ACC), table selection accuracy (TB), dimension accuracy (DM), metric accuracy (MS), filter accuracy (FT), and P90 latency.

Results on Internal Datasets

RedParrot reduced P90 latency by 16.4 s (RED‑commerce‑095) and 21.3 s (RED‑community‑0916) relative to the long‑chain baseline, while improving execution accuracy by 8.26 percentage points. Table selection accuracy rose to 85.99 % (up 21.59 pp). Dimension, metric, and filter accuracies improved by 15.48, 11.65, and 5.14 pp respectively.

Generalization to Open Benchmarks

On Spider‑DSL, overall accuracy increased from 47.9 % (ICL baseline) to 77.8 % (+29.9 pp). On BIRD‑DSL, accuracy rose from 25.8 % to 65.5 % (+39.7 pp), demonstrating strong transferability beyond the internal data.

Ablation and Cache Update

Removing the entity‑agnostic encoder decreased accuracy by 5.90 pp (RED‑commerce‑0916) and 8.20 pp (RED‑community‑0916). Excluding knowledge retrieval caused noticeable accuracy loss on larger datasets, confirming its role in handling new entities. Disabling the short‑chain cache increased runtime by at least threefold.

Cache update strategies were compared: full rebuild guarantees consistency but scales poorly; incremental updates that add only high‑confidence repeated patterns or novel structures achieve a 3.7× speedup, with up to 5.13× acceleration in RED‑commerce.

Engineering Deployment

The system forms a closed‑loop pipeline. Offline stages construct high‑quality templates, perform clustering, graph filtering, and train the entity‑agnostic model, storing skeletons in a Milvus vector store. Online, a workflow engine orchestrates short‑ and long‑chain paths, supports asynchronous monitoring, automatic fallback, and continuous cache evolution. The design yields reduced user‑perceived latency, lower token costs, fewer LLM calls, and mitigated error propagation, turning successful historical queries into reusable assets.

Paper link: https://arxiv.org/abs/2604.22758

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

LLM RAG Business Analytics Semantic Caching Performance Acceleration NL-to-DSL Query Skeleton

Written by

Xiaohongshu Tech REDtech

Official account of the Xiaohongshu tech team, sharing tech innovations and problem insights, advancing together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.