Designing an Effective Log System for Startups: Levels, Collection, and ELK Architecture
This article explains how internet startups can build a robust logging system by defining log levels, essential log fields, best‑practice logging principles, and choosing between simple file logs or an ELK‑based collection pipeline for monitoring, troubleshooting, and analytics.
Log System Overview
Internet services generate massive amounts of logs—including system, transaction, exception, access, and audit logs. These logs feed offline and online analysis platforms, support operational insight, and serve as the primary source for developers to diagnose issues.
Three Core Components of a Log System
Log Generation : Client‑side and server‑side code emit structured entries that capture runtime state, aid troubleshooting, and provide data for downstream analysis.
Log Collection : Distributed logs must be aggregated from many machines into a central repository.
Log Analysis : Collected logs are processed for monitoring, alerting, business analytics, or direct developer investigation.
Log Levels
RFC 5424 defines eight levels (Emergency, Alert, Critical, Error, Warning, Notice, Informational, Debug). In practice many teams adopt a simplified four‑level scheme:
Error : Critical failures that halt the system.
Warning : Issues that do not stop operation but require attention.
Info : Important business events such as successful logins.
Debug : Verbose diagnostic data for developers.
Typical defaults: production logs at Info (or higher), testing may enable Debug on demand, and development often logs everything at Debug .
Required Log Fields
Every log entry should contain at least:
Version
Timestamp (or NULL if unavailable)
Application name (identifies the service or device)
Message type (semantic categorization)
Message content (the actual log text)
Additional fields useful for troubleshooting:
Result (outcome of the operation)
Consumed time (duration of the step)
Source (e.g., IP address, caller)
Location/Path (file and line number, if applicable)
Best‑Practice Guidelines
Traceability (logId) : Assign a globally unique logId to each request—e.g., businessId_userId_timestamp_random. Propagate this ID through all services so logs can be correlated across machines.
Structured Data : Emit logs as JSON or key‑value pairs that include IP, timestamps, service name, class/function, and line number. A logging library should enforce the structure.
Human‑Friendly Messages : Write clear messages, e.g., “Failed to save user profile for userId=1234” instead of vague “Save error”.
Log Collection Strategies
1. Local File with Synchronization
Each instance writes plain‑text logs to disk. For small setups a simple rsync script can copy logs to a central node:
rsync -avz /var/log/app/*.log collector:/logs/app/When the volume grows, an ELK stack is a common next step.
2. Dedicated Log Service (ELK Example)
A typical ELK pipeline consists of:
Logstash agent on each host to ship logs to a buffering queue.
Redis (or similar) as a peak‑shaving queue.
Logstash cluster that parses logs, converts them to JSON, and forwards them to Elasticsearch.
Elasticsearch for real‑time storage, search, and aggregation.
Kibana for visual exploration and reporting.
Optional extensions include custom SDKs for writers, relay agents, collectors with routing rules, long‑term storage in HDFS, or real‑time processing with Storm or Spark.
Post‑Collection Uses
Real‑time monitoring and alerting (e.g., Zabbix or custom MQ‑driven alerts).
Issue‑tracking integration (push matched alerts to Jira).
Search and reporting via Elasticsearch + Kibana.
Batch analytics using Hadoop or Spark.
Streaming pipelines (Kafka → Storm) for low‑latency insights.
Recommendations for Early‑Stage Projects
For startups a pragmatic approach is to start with local file logging combined with an ELK stack for collection. Implement a thin wrapper library that enforces the required fields and, when resources allow, adds a trace‑ID system. The trace‑ID can be composed of business‑id, user‑id, timestamp, and a random suffix to guarantee uniqueness.
Reference: http://diyitui.com/content-1432833903.30823746.html
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architecture and Beyond
Focused on AIGC SaaS technical architecture and tech team management, sharing insights on architecture, development efficiency, team leadership, startup technology choices, large‑scale website design, and high‑performance, highly‑available, scalable solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
