Operations 10 min read

Avoid These 6 Log Management Anti‑Patterns to Keep Your Cloud‑Native Systems Reliable

Effective log management is crucial for cloud‑native observability, yet common practices like copy‑truncate rotation, NAS storage, multi‑process writes, file‑hole creation, frequent overwrites, and vim edits can cause data loss or duplicate collection; adopting create‑mode rotation, local disks, append‑only writes, and proper tools mitigates these risks.

Alibaba Cloud Native

Jun 18, 2025

Avoid These 6 Log Management Anti‑Patterns to Keep Your Cloud‑Native Systems Reliable

Background

Observing a system’s runtime and troubleshooting issues rely heavily on logs, which serve as a long‑standing observable mechanism. A sound local log‑management strategy preserves complete history, minimizes performance overhead, and facilitates downstream collection and analysis. However, many operational pitfalls prevent mainstream collectors (LoongCollector, Filebeat, Fluentbit, Vector, OpenTelemetry Collector) from working perfectly.

Anti‑patterns

1. Using copy‑truncate mode for log rotation

The copy step creates a new file that may be collected as duplicate content because the inode changes.

A time window between copy and truncate can cause log entries to be lost, as they are neither in the copied file nor in the truncated original.

Truncate may shrink the file or alter its header, leading collectors to treat it as a new file and re‑collect data.

2. Storing logs on NAS or OSS

Final‑consistency models cause metadata (size) to be updated before the actual content, creating mismatches.

Readers may encounter file holes (null bytes) when metadata shows growth but data has not yet synced.

Write latency can delay visibility of new data, causing collection lag.

NAS lacks inotify and has poor directory‑listing performance, so files may be missed entirely.

3. Multi‑process writing to the same log file

Concurrent writes interleave, producing garbled log entries.

Collectors may start reading a file while other processes continue writing, causing newly written data to be skipped.

File‑lock contention degrades write performance and reliability.

4. Creating file holes to free space

Tools like LoongCollector use file‑header content as a unique identifier; inserting holes changes the signature, making the collector treat the file as new.

Replacing original bytes with \0 can erase historical log data.

Frequent hole creation fragments the filesystem, harming read/write performance.

5. Frequently overwriting the entire log file

Metadata (size) may be updated before the actual content, leading collectors to read incomplete or inconsistent data.

Overwrites during collection can cause data corruption or loss.

Historical logs cannot be retained, reducing traceability.

6. Editing logs with vim (replace‑on‑save)

Vim writes to a temporary file and then renames it, changing the inode and confusing collectors.

The new file’s header differs, breaking the collector’s signature check.

If the logging process does not switch to the new file promptly, log entries may be lost.

Recommendations

Prefer **create**‑mode rotation: create a new file and rename the old one, preserving continuity.

Use local disks (or EBS for cloud VMs) for log storage instead of NAS/OSS.

Write logs in **append** mode; assign each process its own log file to avoid interleaving.

Adopt standard log‑rotation tools such as logrotate or similar mechanisms.

If unavoidable, configure collectors with precise path names, enable de‑duplication, and implement robust error‑handling on the consumer side.

For space reclamation, use fallocate rather than truncate or dd.

When only reading logs, use read‑only utilities like less or grep.

Summary

Logs are the “black box” of a system; their management quality directly impacts fault‑diagnosis efficiency and overall reliability. By avoiding the anti‑patterns described above and following best practices—using proper rotation, local storage, single‑threaded append writes, and reliable tools—organizations can significantly reduce log‑collection risks and improve observability performance.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Cloud Native Operations log management log rotation

Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.