Why Enabling Line‑Start Regex Slows Logtail and How to Speed It Up
This article examines why Logtail’s performance drops when line‑start regular expressions are used for multi‑line logs, explains the underlying boost::regex_match behavior, and demonstrates how switching to a prefix‑only regex or boost::regex_search can boost collection speed by up to seven times.
Background
In the log analysis field, Logtail is a widely used log collection tool, and any performance improvement can significantly increase overall efficiency. Recent performance tests revealed that enabling line‑start regular expression handling for multi‑line logs caused a noticeable slowdown.
How Logtail Processes Multi‑Line Logs
Logtail merges multi‑line logs based on a configured pattern. The workflow is:
User configures a line‑start regex.
Logtail applies this regex to the beginning of each log line.
If a line does not match, Logtail waits for a matching line start.
For example, with a regex cnt.*, Logtail matches the entire line, which can be costly.
Implementation Details
Logtail uses boost::regex_match for full‑line matching. This function attempts to match the entire buffer against the regex, causing the time to grow linearly with the length of the non‑matching part of the log.
bool BoostRegexMatch(const char* buffer, size_t size, const boost::regex& reg, string& exception) {</code><code> // ...</code><code> if (boost::regex_match(buffer, buffer + size, reg)) {</code><code> return true;</code><code> }</code><code> // ...</code><code>}Benchmark code showed that as the length of log data unrelated to the regex increases, the execution time of boost::regex_match grows linearly.
static void BM_Regex_Match(int batchSize) {</code><code> std::string buffer = "cnt:";</code><code> std::string regStr = "cnt.*";</code><code> boost::regex reg(regStr);</code><code> // loop measuring duration...</code><code>}Optimizing the Regex
Instead of matching the whole line, only the prefix needs to be matched. Using boost::regex_search with the boost::match_continuous flag allows matching just the start of the line.
bool BoostRegexSearch(const char* buffer, size_t size, const boost::regex& reg, string& exception) {</code><code> if (boost::regex_search(buffer, buffer + size, what, reg, boost::match_continuous)) {</code><code> return true;</code><code> }</code><code> return false;</code><code>}Logtail can automatically strip the trailing .* from user‑provided regexes and prepend ^ to improve efficiency.
Performance Test
After applying the optimization, a benchmark comparing Logtail 2.1.1 (optimized) with Logtail 1.8.7 (pre‑optimization) showed a speed increase from 90 MB/s to 633 MB/s, a seven‑fold improvement.
Metric
Original
Optimized
iLogtail collection rate (MB/s)
90
633
Conclusion
When log lines are long, a simple code change that replaces full‑line regex matching with a prefix‑only search can dramatically improve Logtail’s collection performance, especially as the non‑matching portion of logs grows.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Observability
Driving continuous progress in observability technology!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
