How We Slashed AI Token Costs by Up to 90% with Smart Pipeline Optimizations

This report details a systematic analysis of AI token consumption in a multilingual UI‑automation workflow and presents four concrete optimization techniques—prompt trimming, duplicate‑call avoidance, text deduplication, and placeholder‑based knowledge‑base integration—that together reduced monthly token usage by over 90% without harming detection accuracy.

Qunhe Technology Quality Tech
Qunhe Technology Quality Tech
Qunhe Technology Quality Tech
How We Slashed AI Token Costs by Up to 90% with Smart Pipeline Optimizations

Background and Goal

Our UI‑automation platform calls an LLM for multilingual text compliance checks. Token usage from system prompts, user text, external knowledge bases and chat history caused high cost. Goal: reduce token consumption by 30‑50% while keeping detection quality.

System Call Flow

Jenkins triggers the UI‑automation suite.

Each test case reports rendered page text to a backend service.

The backend forwards the text together with a system prompt and optional chat history to the LLM.

The LLM returns analysis results, which are stored for later review.

Token Consumption Sources

System Message : task description, input/output format, etc.

Human Message : the page text to be checked.

External Knowledge Base : language‑specific terminology and rules.

Chat History : context from previous calls.

Optimization Measures

Trim System Prompt : keep only essential directives. Example:

System Message = "Detect language compliance for the following text. Output JSON."

External rules are moved to a knowledge base.

Avoid Duplicate Calls : before invoking the LLM, check the bug status. If the bug is already Rejected or Completed, skip the call.

if bug.status in ["Rejected","Completed"]: skip()

Text Aggregation & Deduplication : hash text + error_type to identify identical inputs. Call the LLM once per unique key and distribute the result.

key = hash(text + error_type)
if key not in seen:
    call_llm()

Filter Meaningless Data : discard pure numbers, pure symbols, or strings that contain only digits and symbols using regex:

^\d+$                # pure numbers
^[^a-zA-Z0-9]+$      # pure symbols
^(?=.*\d)(?=.*[^a-zA-Z0-9])[^\\sa-zA-Z]+$   # digits + symbols only

Placeholder & Mapping Table : replace high‑frequency proper nouns with placeholders (e.g., [COMPANY_NAME]) before prompting the LLM, then restore them after the response. This reduces prompt length.

Instrumentation

Each processing stage records token count before and after execution, enabling a dashboard that compares estimated vs. actual token usage per method.

Results

After applying the measures, monthly token consumption dropped from ~2,000 w tokens to ~100 w tokens (>90 % reduction). The false‑positive rate remained stable at 10‑20 %.

Conclusion and Outlook

The optimizations effectively cut LLM costs without harming detection quality. Future work includes assigning unique identifiers to knowledge‑base entries and linking them directly in detection reports to further streamline bug triage.

System call flow diagram
System call flow diagram
cost reductionText DeduplicationAI token optimizationautomation pipelineplaceholder mappingtoken estimation
Qunhe Technology Quality Tech
Written by

Qunhe Technology Quality Tech

Kujiale Technology Quality

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.