Big Data 14 min read

Tencent CLS: High‑Performance Time‑Series Search Engine for Cloud Log Service

Tencent’s Cloud Log Service augments Lucene with a dedicated time‑series index—using timestamp ordering, a secondary index, reverse binary search, and histogram optimization—to cut log query complexity, delivering up to 40‑50× faster responses, higher concurrency, and markedly better performance than traditional ELK‑style and competing cloud log solutions.

Tencent Cloud Developer
Tencent Cloud Developer
Tencent Cloud Developer
Tencent CLS: High‑Performance Time‑Series Search Engine for Cloud Log Service

The Tencent Cloud Log Service (CLS) team extended a traditional search engine with a time‑series concept, creating a time‑series search engine. Their research paper "TencentCLS: The Cloud Log Service with High Query Performances" was accepted by the VLDB 2022 conference and will be presented in Sydney.

In massive log retrieval scenarios, the time‑series search engine achieves nearly 40× performance improvement over conventional search engines. CLS therefore gains a substantial advantage over mainstream log products such as ELK.

Business Background

CLS uses Apache Lucene for indexing and searching massive log data. Lucene is optimized for generic text search and does not efficiently exploit the characteristics of log data, especially high‑cardinality timestamps. Many companies have abandoned Lucene for custom log search engines, claiming up to a 2× performance boost.

To meet diverse business analysis needs, CLS introduced a dedicated time‑series search engine on top of Lucene. Compared with traditional search, the time‑series engine improves forward, reverse, and histogram queries by 38×, 24×, and 7.6× respectively.

Technical Background – Log Search in Lucene

A typical log entry contains a timestamp, text, and attributes (e.g., IP). Example log line:

[2021-09-28 10:10:39T1234] [ip=192.168.1.1] XXXXXXXXX

In Lucene, each field is indexed with a posting list (docid list). For timestamps, the posting list looks like:

timestamp->[docid1, docid2]

and an actual entry may be:

2021-09-28 10:10:39T1234->[1,5]

Typical query (SQL‑like) on logs:

Select * from xxxx_index where ip = xxxx and timestamp >= 2021-09-28 and timestamp <= 2021-09-29;

High‑cardinality timestamp range queries cause massive scanning because the inverted index must traverse millions or billions of timestamp terms, leading to O(n) complexity.

Solution – Time‑Series Index

1. **Timestamp Ordering** – Logs are stored ordered by timestamp, reducing a range query to locating the two endpoints (O(log n)).

2. **Secondary Index** – Introduced to limit disk accesses during binary search (from dozens of seeks to ~3).

3. **Reverse Binary Search** – Enables efficient reverse iteration without scanning the entire dataset, reducing iteration from O(n) to O(log² n).

4. **Histogram Optimization** – Bucket boundaries are resolved via index look‑ups, eliminating per‑log back‑table look‑ups.

These techniques together lower algorithmic complexity and I/O cost for forward, reverse, and histogram queries.

Testing & Comparison

Offline prototype tests with 8 million records under 100 concurrent queries showed:

50× faster response (1.059 s vs 56.9 s).

20× higher concurrency while keeping latency < 1 s (90 qps vs 4 qps).

Online tests with mixed read/write workloads confirmed > 10× speedup over native Lucene, even after accounting for I/O tail latency.

Comparisons with a competing cloud log service (which indexes only at minute granularity) demonstrated that CLS’s microsecond‑level time‑series index provides orders of magnitude more index entries (up to 86 billion per day) and superior query accuracy.

Conclusion

The VLDB reviewers highlighted the paper’s clear presentation of a real‑world problem, the effectiveness of the time‑series index, and the substantial performance gains (over one order of magnitude) compared to Lucene. The work offers valuable insights for any system that requires high‑cardinality range queries on time‑series data.

LuceneHigh Performancecloud log servicelog searchtime-series indexVLDB
Tencent Cloud Developer
Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.