How I Cut Search API Latency from 150ms to 32ms in 10 Days
This article recounts a developer's step‑by‑step optimization of a search API—detailing four load‑test rounds, configuration tweaks, batch queries, caching, and code clean‑up—that ultimately reduced average response time from 150 ms to 32 ms, meeting a sub‑100 ms requirement.
Recently I optimized a search API, performed four rounds of load testing, and finally met the performance requirement, celebrating with a chicken leg.
Business Logic
The service retrieves data from OpenSearch, fills and assembles the response, and returns it to the client.
Although the logic seemed simple and was initially estimated to take five days, the actual development, integration, bug fixing, and deployment spanned about ten days due to many factors affecting the response structure, such as configuration, database, cache, OpenSearch, and code issues.
The key point of this article is not to avoid complex logic but to focus on the optimization process.
The APP‑facing API must respond within 100 ms.
First Load Test
The average response time was a dismal 150 ms, and the test also revealed backend errors caused by OpenSearch's per‑second query limit.
Modified OpenSearch configuration and switched the test environment to use the internal network address.
Changed the code to replace repeated cache queries with a single batch query.
Removed unused code after confirming with teammates.
Second Load Test
Even after code and configuration improvements, the situation worsened and introduced new problems.
After confirming that cache query frequency was minimized and thread‑pool parameters were set to reasonable high values, the next step was to cache the result set: using user ID and search keyword as the key and storing the result for five minutes.
Third Load Test
Finally the requirement was met: with 60 concurrent users the response time dropped to 32 ms, and a new optimization opportunity was discovered.
Further inspection revealed unnecessary database queries within the API, which were eliminated.
Growth
Learned to use RedisTemplate.executePipelined for batch Redis queries.
Key Takeaways from This Optimization
Avoid looping over database or cache queries; loops must not contain any cache or DB calls.
API endpoints should generally read directly from cache, not the database.
Prefer batch queries over single‑row queries to reduce round‑trips.
When using cloud services (e.g., Alibaba Cloud), ensure proper configuration and use internal network addresses in production.
Pay attention to connection pool sizes for databases, Redis, and thread pools.
Run load tests on dedicated machines without other services to avoid interference; in production, isolate critical services per machine.
Remove or comment out unused code and dependencies promptly.
Maintain a healthy cluster configuration.
Utilize monitoring tools such as link tracing (PinPoint) to pinpoint issues.
If technical optimization space is exhausted, address performance from a business perspective using real traffic data.
Each code change can introduce new bugs; perform regression testing after modifications, using tools like Postman and Beyond Compare.
Increase logging in critical areas to simplify future troubleshooting.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Programmer DD
A tinkering programmer and author of "Spring Cloud Microservices in Action"
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
