Operations 10 min read

How to Boost Logtail Multiline Log Collection Speed by Up to 7×

This article investigates why enabling line‑prefix regex for multiline logs slows Logtail down, explains the underlying regex matching mechanism, and demonstrates how switching from boost::regex_match to boost::regex_search with proper flags can dramatically improve collection throughput, achieving a seven‑fold speed increase.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
How to Boost Logtail Multiline Log Collection Speed by Up to 7×

When Logtail processes multiline logs using a line‑prefix regular expression, performance can degrade noticeably. The article examines the cause and presents a solution that replaces the full‑match approach ( boost::regex_match) with a prefix‑match approach ( boost::regex_search with boost::match_continuous), eliminating unnecessary processing of long log lines.

Background

Logtail is a widely used log collector; any performance gain directly improves overall efficiency. During testing, enabling a line‑prefix regex caused a drop in collection speed, prompting an investigation.

Analysis

Logtail merges multiline logs by applying a user‑defined line‑prefix regex to each line. The workflow is:

User configures a line‑prefix regex.

Logtail applies the regex to the start of each log line.

If a line does not match, Logtail waits for a matching line.

Example: with a log format and regex cnt.*, Logtail matches the entire first line (over a thousand characters) using boost::regex_match, which performs a full‑buffer match.

bool BoostRegexMatch(const char* buffer, size_t size, const boost::regex& reg, string& exception) {
    // ...
    if (boost::regex_match(buffer, buffer + size, reg)) {
        return true;
    }
    // ...
}

Benchmark code shows that the matching time grows linearly with the length of the non‑prefix part of the log.

static void BM_Regex_Match(int batchSize) {
    std::string buffer = "cnt:";
    std::string regStr = "cnt.*";
    boost::regex reg(regStr);
    std::ofstream outFile("BM_Regex_Match.txt", std::ios::trunc);
    outFile.close();
    for (int i = 0; i < 1000; i++) {
        std::ofstream outFile("BM_Regex_Match.txt", std::ios::app);
        buffer += "a";
        int count = 0;
        uint64_t durationTime = 0;
        for (int i = 0; i < batchSize; i++) {
            count++;
            uint64_t startTime = GetCurrentTimeInMicroSeconds();
            if (!boost::regex_match(buffer, reg)) {
                std::cout << "error" << std::endl;
            }
            durationTime += GetCurrentTimeInMicroSeconds() - startTime;
        }
        outFile << i << '\t' << "durationTime: " << durationTime << std::endl;
        outFile << i << '\t' << "process: " << formatSize(buffer.size() * (uint64_t)count * 1000000 / durationTime) << std::endl;
        outFile.close();
    }
}

int main(int argc, char** argv) {
    logtail::Logger::Instance().InitGlobalLoggers();
    std::cout << "BM_Regex_Match" << std::endl;
    BM_Regex_Match(10000);
    return 0;
}

Because the regex often ends with .*, only the prefix (e.g., cnt) needs to be matched. Using boost::regex_search with the boost::match_continuous flag matches just the prefix, avoiding the costly full‑buffer scan.

bool BoostRegexSearch(const char* buffer, size_t size, const boost::regex& reg, string& exception) {
    // ...
    if (boost::regex_search(buffer, buffer + size, what, reg, boost::match_continuous)) {
        return true;
    }
    // ...
}

In Logtail, the existing implementation automatically strips the trailing .* from user regexes and prepends ^ to improve efficiency.

Performance Test

After applying the prefix‑match optimization, a benchmark comparing Logtail 1.8.7 (pre‑optimization) and Logtail 2.1.1 (post‑optimization) showed a throughput increase from ~90 MB/s to ~633 MB/s—a seven‑fold improvement.

Metric

Before

After

iLogtail collection rate (MB/s)

90 MB/s

633 MB/s

Conclusion

When log lines are long, a simple code change—replacing boost::regex_match with boost::regex_search and adjusting the regex—can dramatically boost Logtail’s multiline log collection performance, especially as the non‑prefix portion of logs grows.

Performance comparison chart
Performance comparison chart
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

log collectionperformance benchmarkinglogtailboost::regexregex optimizationmultiline logs
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.