Cloud Native 9 min read

Why Enabling Line‑Start Regex Slows Logtail and How to Speed It Up

This article examines why Logtail’s performance drops when line‑start regular expressions are used for multi‑line logs, explains the underlying boost::regex_match behavior, and demonstrates how switching to a prefix‑only regex or boost::regex_search can boost collection speed by up to seven times.

Alibaba Cloud Observability
Alibaba Cloud Observability
Alibaba Cloud Observability
Why Enabling Line‑Start Regex Slows Logtail and How to Speed It Up

Background

In the log analysis field, Logtail is a widely used log collection tool, and any performance improvement can significantly increase overall efficiency. Recent performance tests revealed that enabling line‑start regular expression handling for multi‑line logs caused a noticeable slowdown.

How Logtail Processes Multi‑Line Logs

Logtail merges multi‑line logs based on a configured pattern. The workflow is:

User configures a line‑start regex.

Logtail applies this regex to the beginning of each log line.

If a line does not match, Logtail waits for a matching line start.

For example, with a regex cnt.*, Logtail matches the entire line, which can be costly.

Implementation Details

Logtail uses boost::regex_match for full‑line matching. This function attempts to match the entire buffer against the regex, causing the time to grow linearly with the length of the non‑matching part of the log.

bool BoostRegexMatch(const char* buffer, size_t size, const boost::regex& reg, string& exception) {</code><code>    // ...</code><code>    if (boost::regex_match(buffer, buffer + size, reg)) {</code><code>        return true;</code><code>    }</code><code>    // ...</code><code>}

Benchmark code showed that as the length of log data unrelated to the regex increases, the execution time of boost::regex_match grows linearly.

static void BM_Regex_Match(int batchSize) {</code><code>    std::string buffer = "cnt:";</code><code>    std::string regStr = "cnt.*";</code><code>    boost::regex reg(regStr);</code><code>    // loop measuring duration...</code><code>}

Optimizing the Regex

Instead of matching the whole line, only the prefix needs to be matched. Using boost::regex_search with the boost::match_continuous flag allows matching just the start of the line.

bool BoostRegexSearch(const char* buffer, size_t size, const boost::regex& reg, string& exception) {</code><code>    if (boost::regex_search(buffer, buffer + size, what, reg, boost::match_continuous)) {</code><code>        return true;</code><code>    }</code><code>    return false;</code><code>}

Logtail can automatically strip the trailing .* from user‑provided regexes and prepend ^ to improve efficiency.

Performance Test

After applying the optimization, a benchmark comparing Logtail 2.1.1 (optimized) with Logtail 1.8.7 (pre‑optimization) showed a speed increase from 90 MB/s to 633 MB/s, a seven‑fold improvement.

Metric

Original

Optimized

iLogtail collection rate (MB/s)

90

633

Conclusion

When log lines are long, a simple code change that replaces full‑line regex matching with a prefix‑only search can dramatically improve Logtail’s collection performance, especially as the non‑matching portion of logs grows.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Performance Testinglogtailboost::regexregex optimization
Alibaba Cloud Observability
Written by

Alibaba Cloud Observability

Driving continuous progress in observability technology!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.