How LoongCollector’s OneTime File Collection Transforms Static Log Migration
LoongCollector’s OneTime file collection feature enables fast, reliable migration of historical logs, data back‑filling, and batch processing by scanning files once, using checkpoints for fault tolerance, configurable execution windows, and rate‑limiting to avoid impacting live data streams.
OneTime file collection mode
LoongCollector (Alibaba Cloud Log Service’s next‑generation data collector) provides a OneTime mode for scenarios such as historical log migration, data back‑filling, or temporary batch processing where a continuously running collector is unsuitable.
Pipeline types
Continuous : The pipeline stays resident and continuously discovers new data (e.g., input_file).
OneTime : The pipeline runs once, processes all files that match the configured paths at start‑up, then exits (e.g., input_static_file_onetime).
OneTime configuration basics
The essential fields are:
enable: true
global:
ExcutionTimeout: 3600 # seconds, default 10 min, range 10 min–1 week
inputs:
- Type: input_static_file_onetime
FilePaths:
- /var/log/history/*.log
flushers:
- Type: flusher_stdout
OnlyStdout: true
Tags: trueWhen global.ExcutionTimeout is present, LoongCollector treats the pipeline as OneTime and calculates an expiration time (start + timeout).
Execution and expiration windows
Config delivery window : Only agents that have reported a heartbeat within a short period (default 5 minutes) after the configuration is created receive it.
Execution window : The pipeline runs for at most global.ExcutionTimeout (default 10 minutes, configurable up to 1 week).
Retention period : The server keeps the configuration for 7 days for troubleshooting or reuse.
Checkpoint mechanism
Two checkpoint files guarantee reliability across restarts:
Config‑level checkpoint ( /etc/ilogtail/checkpoint/onetime_config_info.json) stores config_hash, expire_time, inputs_hash and excution_timeout. It is used to restore the expiration time and decide whether a configuration needs to be re‑run after an update.
File‑level checkpoint (
/etc/ilogtail/checkpoint/input_static_file/{config_name}@0.json) records per‑file progress (device, inode, signature hash, size, status, timestamps). Example content:
{
"config_name": "example",
"expire_time": 1768550944,
"file_count": 1,
"files": [
{
"dev": 2051,
"filepath": "/var/log/tmpfs.log",
"finish_time": 1768550345,
"inode": 2888304,
"size": 1282,
"start_time": 1768550345,
"status": "finished"
}
],
"finish_time": 1768550345,
"input_index": 0,
"start_time": 1768550344,
"status": "finished"
}Resource usage and throughput control
The OneTime input plugin ( input_static_file_onetime) runs in a single thread inside LoongCollector’s StaticFileServer, avoiding uncontrolled concurrency.
Implemented in native C++, it can ingest up to 300 MB/s for single‑threaded text logs.
Sending rate can be limited with flusher_sls.MaxSendRate (bytes per second) to protect network bandwidth and SLS write quotas.
Best‑practice scenarios
Large‑scale backfill : For 1 000 machines each needing to backfill ~10 GB, set MaxSendRate to ≈290 000 B/s (≈0.28 MB/s per machine) and increase ExcutionTimeout to 86 400 s (1 day) to avoid quota exhaustion and ensure completion.
Partial time‑range backfill : Combine the native timestamp filter processor ( processor_timestamp_filter_native) with JSON parsing processors to keep only events within the desired window, preventing duplicate ingestion of already collected data.
Correcting a faulty configuration : If the initial OneTime config produces unexpected data, set ForceRerunWhenUpdate: true to force a re‑run after updating the configuration, then verify the new output. Erroneous data can be removed with Log Service’s soft‑delete feature.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
