Tencent CLS: High‑Performance Time‑Series Search Engine for Cloud Log Service
Tencent’s Cloud Log Service augments Lucene with a dedicated time‑series index—using timestamp ordering, a secondary index, reverse binary search, and histogram optimization—to cut log query complexity, delivering up to 40‑50× faster responses, higher concurrency, and markedly better performance than traditional ELK‑style and competing cloud log solutions.
The Tencent Cloud Log Service (CLS) team extended a traditional search engine with a time‑series concept, creating a time‑series search engine. Their research paper "TencentCLS: The Cloud Log Service with High Query Performances" was accepted by the VLDB 2022 conference and will be presented in Sydney.
In massive log retrieval scenarios, the time‑series search engine achieves nearly 40× performance improvement over conventional search engines. CLS therefore gains a substantial advantage over mainstream log products such as ELK.
Business Background
CLS uses Apache Lucene for indexing and searching massive log data. Lucene is optimized for generic text search and does not efficiently exploit the characteristics of log data, especially high‑cardinality timestamps. Many companies have abandoned Lucene for custom log search engines, claiming up to a 2× performance boost.
To meet diverse business analysis needs, CLS introduced a dedicated time‑series search engine on top of Lucene. Compared with traditional search, the time‑series engine improves forward, reverse, and histogram queries by 38×, 24×, and 7.6× respectively.
Technical Background – Log Search in Lucene
A typical log entry contains a timestamp, text, and attributes (e.g., IP). Example log line:
[2021-09-28 10:10:39T1234] [ip=192.168.1.1] XXXXXXXXX
In Lucene, each field is indexed with a posting list (docid list). For timestamps, the posting list looks like:
timestamp->[docid1, docid2]
and an actual entry may be:
2021-09-28 10:10:39T1234->[1,5]
Typical query (SQL‑like) on logs:
Select * from xxxx_index where ip = xxxx and timestamp >= 2021-09-28 and timestamp <= 2021-09-29;
High‑cardinality timestamp range queries cause massive scanning because the inverted index must traverse millions or billions of timestamp terms, leading to O(n) complexity.
Solution – Time‑Series Index
1. **Timestamp Ordering** – Logs are stored ordered by timestamp, reducing a range query to locating the two endpoints (O(log n)).
2. **Secondary Index** – Introduced to limit disk accesses during binary search (from dozens of seeks to ~3).
3. **Reverse Binary Search** – Enables efficient reverse iteration without scanning the entire dataset, reducing iteration from O(n) to O(log² n).
4. **Histogram Optimization** – Bucket boundaries are resolved via index look‑ups, eliminating per‑log back‑table look‑ups.
These techniques together lower algorithmic complexity and I/O cost for forward, reverse, and histogram queries.
Testing & Comparison
Offline prototype tests with 8 million records under 100 concurrent queries showed:
50× faster response (1.059 s vs 56.9 s).
20× higher concurrency while keeping latency < 1 s (90 qps vs 4 qps).
Online tests with mixed read/write workloads confirmed > 10× speedup over native Lucene, even after accounting for I/O tail latency.
Comparisons with a competing cloud log service (which indexes only at minute granularity) demonstrated that CLS’s microsecond‑level time‑series index provides orders of magnitude more index entries (up to 86 billion per day) and superior query accuracy.
Conclusion
The VLDB reviewers highlighted the paper’s clear presentation of a real‑world problem, the effectiveness of the time‑series index, and the substantial performance gains (over one order of magnitude) compared to Lucene. The work offers valuable insights for any system that requires high‑cardinality range queries on time‑series data.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.