How We Slashed AI Token Costs by Up to 90% with Smart Pipeline Optimizations
This report details a systematic analysis of AI token consumption in a multilingual UI‑automation workflow and presents four concrete optimization techniques—prompt trimming, duplicate‑call avoidance, text deduplication, and placeholder‑based knowledge‑base integration—that together reduced monthly token usage by over 90% without harming detection accuracy.
Background and Goal
Our UI‑automation platform calls an LLM for multilingual text compliance checks. Token usage from system prompts, user text, external knowledge bases and chat history caused high cost. Goal: reduce token consumption by 30‑50% while keeping detection quality.
System Call Flow
Jenkins triggers the UI‑automation suite.
Each test case reports rendered page text to a backend service.
The backend forwards the text together with a system prompt and optional chat history to the LLM.
The LLM returns analysis results, which are stored for later review.
Token Consumption Sources
System Message : task description, input/output format, etc.
Human Message : the page text to be checked.
External Knowledge Base : language‑specific terminology and rules.
Chat History : context from previous calls.
Optimization Measures
Trim System Prompt : keep only essential directives. Example:
System Message = "Detect language compliance for the following text. Output JSON."External rules are moved to a knowledge base.
Avoid Duplicate Calls : before invoking the LLM, check the bug status. If the bug is already Rejected or Completed, skip the call.
if bug.status in ["Rejected","Completed"]: skip()Text Aggregation & Deduplication : hash text + error_type to identify identical inputs. Call the LLM once per unique key and distribute the result.
key = hash(text + error_type)
if key not in seen:
call_llm()Filter Meaningless Data : discard pure numbers, pure symbols, or strings that contain only digits and symbols using regex:
^\d+$ # pure numbers
^[^a-zA-Z0-9]+$ # pure symbols
^(?=.*\d)(?=.*[^a-zA-Z0-9])[^\\sa-zA-Z]+$ # digits + symbols onlyPlaceholder & Mapping Table : replace high‑frequency proper nouns with placeholders (e.g., [COMPANY_NAME]) before prompting the LLM, then restore them after the response. This reduces prompt length.
Instrumentation
Each processing stage records token count before and after execution, enabling a dashboard that compares estimated vs. actual token usage per method.
Results
After applying the measures, monthly token consumption dropped from ~2,000 w tokens to ~100 w tokens (>90 % reduction). The false‑positive rate remained stable at 10‑20 %.
Conclusion and Outlook
The optimizations effectively cut LLM costs without harming detection quality. Future work includes assigning unique identifiers to knowledge‑base entries and linking them directly in detection reports to further streamline bug triage.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
