How LoongCollector’s One‑Time File Collection Simplifies Bulk Log Migration
LoongCollector introduces a One‑Time file collection mode that scans matching files once, records a snapshot, and exits, enabling efficient historic log migration, data back‑fill, and temporary batch processing while providing fine‑grained checkpoints, execution windows, and throttling controls to avoid quota issues and ensure reliable completion.
Background
LoongCollector, a next‑generation data collector from Alibaba Cloud Log Service, supports unified collection of logs, metrics, traces, events and profiles. Traditional collectors only monitor incremental data, which makes one‑off tasks such as historic log migration, data back‑fill or batch processing cumbersome.
One‑Time vs Continuous Pipelines
LoongCollector defines two pipeline types:
Continuous – stays resident, continuously discovers new data (e.g., input_file).
OneTime – runs once, scans matching files at start, reads a fixed snapshot and exits (e.g., input_static_file_onetime).
One‑Time Configuration
A minimal OneTime config includes global.ExcutionTimeout (default 10 min, range 10 min‑1 week) and an input plugin ending with _onetime. Example:
enable: true
global:
ExcutionTimeout: 3600
inputs:
- Type: input_static_file_onetime
FilePaths:
- /var/log/history/*.log
flushers:
- Type: flusher_stdout
OnlyStdout: true
Tags: trueLifecycle and Expiration
Three time points are managed on the server side: configuration delivery window (5 min after creation), execution window (controlled by global.ExcutionTimeout), and retention period (7 days). On the client side the collector records start + ExcutionTimeout as the expiration time, cleans up files after expiry, and decides whether a configuration update requires a full rerun (controlled by global.ForceRerunWhenUpdate).
Checkpoint Mechanism
Two checkpoint files guarantee recoverability:
Config‑level checkpoint ( /etc/ilogtail/checkpoint/onetime_config_info.json) stores config_hash, expire_time, inputs_hash and excution_timeout.
File‑level checkpoint (
/etc/ilogtail/checkpoint/input_static_file/{config_name}@0.json) records per‑file status, size, start/finish timestamps and a fingerprint ( dev, inode, sig_hash, sig_size) to support log rotation.
Performance and Throttling
The native C++ input can reach ~300 MB/s in a single thread. Resource usage is controlled by:
Single‑thread scheduling of all input_static_file_onetime instances.
Optional send‑rate limiter flusher_sls.MaxSendRate (bytes per second).
Best‑Practice Scenarios
1. Large‑scale back‑fill
When 1 000 machines each need to back‑fill ~10 GB, calculate a safe MaxSendRate (≈ 290 000 B/s total → ≈ 290 KB/s per machine) and extend ExcutionTimeout to 86 400 s (1 day) to avoid quota errors and ensure completion.
2. Partial time‑range back‑fill
Combine a timestamp filter processor ( processor_timestamp_filter_native) with JSON parsing processors ( processor_parse_json_native, processor_parse_timestamp_native) to keep only events inside the missing interval, preventing duplicate data.
3. Configuration correction
If the first OneTime run produces dirty data, set global.ForceRerunWhenUpdate: true to force a full rerun, then use Log Service soft‑delete to remove the unwanted records.
Summary
One‑time file collection is ideal for historic data migration, network‑outage recovery and temporary batch jobs. By aligning server‑side delivery/expiration windows with client‑side checkpoints and tuning ExcutionTimeout and MaxSendRate, users can reliably ingest static files without disturbing ongoing continuous collection.
Alibaba Cloud Observability
Driving continuous progress in observability technology!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
