Operations 22 min read

How to Collect and Analyze JuiceFS Access Logs with Volcengine TLS

This article explains how to gather JuiceFS access logs using the LogCollector agent, parse and structure them with TLS, design index fields, build analytical dashboards, run advanced SQL queries for write‑IO distribution, sequential‑read ratios, overwrite detection, file‑lifecycle analysis, and set up real‑time monitoring and alerting for performance anomalies.

Volcano Engine Developer Services
Volcano Engine Developer Services
Volcano Engine Developer Services
How to Collect and Analyze JuiceFS Access Logs with Volcengine TLS

Business Background

JuiceFS is a high‑performance, cloud‑native distributed file system released under the Apache 2.0 license, offering full POSIX compatibility and the ability to mount object storage as local disks across multiple hosts.

Its access logs record every operation (type, UID, GID, inode, duration, etc.) and are useful for performance analysis, auditing, and troubleshooting, but the logs are scattered across many clients and servers.

Volcengine Log Service (TLS) Features

Unified collection and management of distributed JuiceFS logs.

Parsing to a uniform structure for easier analysis.

SQL‑based deep query capabilities.

Pre‑built analysis dashboards.

Real‑time monitoring and alerting.

JuiceFS Log Format

2021.01.15 08:26:11.003330 [uid:0,gid:0,pid:4403] write (17669,8666,4993160): OK <0.000010>

The fields represent timestamp, user/group/process IDs, operation type, parameters (inode, size, offset), result status, and execution time.

Log Collection with LogCollector

LogCollector is a high‑performance, low‑resource log collector that can be installed on each JuiceFS client. After installation, you configure collection rules and start parameters via the TLS console.

Log Parsing Rules

Using TLS conditional processor plugins, the raw log line (field __content__) is parsed with regular expressions to extract the following fields: time: timestamp uid, gid,

pid
op

: operation type (write, read, open, etc.) result: operation result (OK or error)

Operation‑specific parameters such as inode, length, offset, filename, mode, filehandle, status, delay, etc.

Examples of regular expressions for different operations are provided in the source.

TLS Index Design

Key index fields include uid, gid, pid, op, inode, length, offset, filename, filehandle, status, mode, parent_inode, delay, and time. These enable fast filtering and aggregation.

Dashboard Examples

Pre‑built dashboard templates cover common scenarios such as write‑operation counts, IO size distribution, sequential‑read ratios, overwrite analysis, and file‑lifecycle statistics. Users can also create custom dashboards.

SQL Queries

Write‑Operation Statistics

op:"write" |
SELECT COUNT(*) AS "总次数",
       AVG(length) AS "平均大小",
       AVG(delay)*1000 AS "平均执行时间/ms",
       MAX(delay)*1000 AS "最大执行时间/ms",
       MIN(delay)*1000 AS "最小执行时间/ms"

Write IO Size Distribution

op: "write" |
SELECT CASE
         WHEN length < 4*1024 THEN '0~4K'
         WHEN length < 8*1024 THEN '4K~8K'
         WHEN length < 16*1024 THEN '8K~16K'
         WHEN length < 32*1024 THEN '16K~32K'
         WHEN length < 64*1024 THEN '32K~64K'
         WHEN length < 128*1024 THEN '64K~128K'
         WHEN length < 256*1024 THEN '128K~256K'
         ELSE '>256K' END AS length,
       COUNT(*) AS cnt
FROM (
  SELECT CASE
           WHEN length < 4*1024 THEN '0~4K'
           WHEN length < 8*1024 THEN '4K~8K'
           WHEN length < 16*1024 THEN '8K~16K'
           WHEN length < 32*1024 THEN '16K~32K'
           WHEN length < 64*1024 THEN '32K~64K'
           WHEN length < 128*1024 THEN '64K~128K'
           WHEN length < 256*1024 THEN '128K~256K'
           ELSE '>256K' END AS length
  FROM log
  WHERE op = "write"
) t
GROUP BY length

Sequential‑Read Ratio

op: "read" |
WITH flagged AS (
  SELECT __path__, inode, length,
         CASE WHEN offset = LAG(offset) OVER (PARTITION BY __path__, inode ORDER BY time)
                   + LAG(length) OVER (PARTITION BY __path__, inode ORDER BY time)
              THEN length ELSE 0 END AS sequentialReadSize
  FROM log
  WHERE op = "read"
)
SELECT CASE
         WHEN ratio < 0.2 THEN '0~20%'
         WHEN ratio < 0.4 THEN '20%~40%'
         WHEN ratio < 0.6 THEN '40%~60%'
         WHEN ratio < 0.8 THEN '60%~80%'
         ELSE '80%~100%'
       END AS "scope",
       COUNT(*) AS cnt
FROM (
  SELECT __path__, inode,
         SUM(sequentialReadSize) * 1.0 / SUM(length) AS ratio
  FROM flagged
  GROUP BY __path__, inode
) t
GROUP BY 1

Overwrite Detection

op: "write" |
WITH flagged AS (
  SELECT __path__, inode, offset, length,
         CASE WHEN LAG(offset) OVER (PARTITION BY __path__, inode ORDER BY offset, time) IS NULL
              THEN -1 ELSE LAG(offset) OVER (PARTITION BY __path__, inode ORDER BY offset, time) END AS lastOffset,
         CASE WHEN LAG(length) OVER (PARTITION BY __path__, inode ORDER BY offset, time) IS NULL
              THEN -1 ELSE LAG(length) OVER (PARTITION BY __path__, inode ORDER BY offset, time) END AS lastLength
  FROM log
  WHERE op = "write"
)
SELECT SUM( MAX(0, MIN(offset+length, lastOffset+lastLength) - MAX(offset, lastOffset)) ) / (1024*1024*1024) AS duplicateSize
FROM flagged
WHERE lastOffset != -1 AND lastLength != -1

File Lifecycle Distribution

op:"unlink" OR op:"create" |
SELECT lifeTime, COUNT(*) AS cnt
FROM (
  SELECT CASE
           WHEN DATE_DIFF('MINUTE', createTime, unlinkTime) < 10 THEN '0~10min'
           WHEN DATE_DIFF('MINUTE', createTime, unlinkTime) < 30 THEN '10~30min'
           WHEN DATE_DIFF('MINUTE', createTime, unlinkTime) < 60 THEN '30~60min'
           ELSE '>60min' END AS lifeTime
  FROM (
    SELECT __path__, inode, DATE_PARSE(time, '%Y.%m.%d %H:%i:%s.%f') AS unlinkTime
    FROM log WHERE op = 'unlink'
  ) AS u
  JOIN (
    SELECT __path__, inode, DATE_PARSE(time, '%Y.%m.%d %H:%i:%s.%f') AS createTime
    FROM log WHERE op = 'create'
  ) AS c ON u.inode = c.inode AND u.__path__ = c.__path__
) t
GROUP BY lifeTime
ORDER BY cnt

Real‑Time Monitoring & Alerting

TLS provides alarm rules that run periodic SQL queries. When a condition (e.g., write latency > 2 seconds) is met, an alert is sent via a notification group (e.g., Feishu webhook) using a custom content template.

op: "write" |
SELECT __path__ AS task, inode,
       SUM(length) AS writeSize,
       SUM(delay) AS consumedTime
FROM log
WHERE op = "write"
GROUP BY __path__, inode
ORDER BY writeSize DESC
LIMIT 1000

The alarm condition checks consumedTime > 2 and triggers a warning level alert every 10 minutes.

Conclusion

By leveraging Volcengine TLS for unified collection, parsing, indexing, dashboarding, SQL analysis, and alerting, JuiceFS users can turn scattered access logs into a powerful observability platform covering basic statistics, sequential‑read detection, overwrite analysis, lifecycle insights, and real‑time performance monitoring.

MonitoringSQLTLSlog analysisJuiceFSLogCollector
Volcano Engine Developer Services
Written by

Volcano Engine Developer Services

The Volcano Engine Developer Community, Volcano Engine's TOD community, connects the platform with developers, offering cutting-edge tech content and diverse events, nurturing a vibrant developer culture, and co-building an open-source ecosystem.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.