Choosing the Right Log Collection Strategy for Alibaba Cloud SAE: SLS, NAS, or Kafka
This guide compares three log‑collection methods—SLS, NAS, and Kafka—available in Alibaba Cloud Serverless App Engine, explaining their architectures, ideal use cases, configuration details, and how to avoid common pitfalls such as log loss and rotation issues.
Why Log Collection Matters in Cloud‑Native Environments
Logs are essential for troubleshooting, monitoring, and alerting in any application. In cloud‑native deployments, limited pod disk space, the need for long‑term retention, strict security policies, and multi‑line stack traces create challenges that differ from traditional servers.
SLS (Log Service) Collection
SAE integrates a Sidecar container running the logtail collector. The Sidecar shares a volume with the business container, receives the configured file paths, and forwards logs to Alibaba Cloud SLS via an internal network address, requiring no outbound internet access. Resource limits (CPU 0.25 C, memory 100 Mi) keep the collector lightweight.
SLS supports most scenarios, offers built‑in alerting and dashboards, and is the default choice when permissions allow.
NAS (Network Attached Storage) Collection
NAS provides a shared, highly available file system with high throughput and IOPS. By mounting NAS into the pod and directing log files to the mounted directory, logs persist across pod restarts, guaranteeing no data loss for critical workloads.
Kafka Collection via Vector
For cases where SLS access is restricted or additional processing is required, logs can be shipped to Kafka. SAE uses the vector collector as a Sidecar. Vector reads configured files, can compress data, set send intervals, and forwards logs to a Kafka broker. Downstream consumers may ingest logs into Elasticsearch or custom processors.
data_dir = "/etc/vector"
[sinks.sae_logs_to_kafka]
type = "kafka"
bootstrap_servers = "kafka_endpoint"
encoding.codec = "json"
encoding.except_fields = ["source_type","timestamp"]
inputs = ["add_tags_0"]
topic = "{{ topic }}"
[sources.sae_logs_0]
type = "file"
read_from = "end"
max_line_bytes = 1048576
max_read_bytes = 1048576
multiline.start_pattern = '^[^\s]'
multiline.mode = "continue_through"
multiline.condition_pattern = '(?m)^[\s|\W].*$|(?m)^(Caused|java|org|com|net).+$|(?m)^\}.*$'
multiline.timeout_ms = 1000
include = ["/sae-stdlog/kafka-select/0.log"]
[transforms.add_tags_0]
type = "remap"
inputs = ["sae_logs_0"]
source = '.topic = "test1"'
[sources.internal_metrics]
scrape_interval_secs = 15
type = "internal_metrics_ext"
[sources.internal_metrics.tags]
host_key = "host"
pid_key = "pid"
[transforms.internal_metrics_filter]
type = "filter"
inputs = ["internal_metrics"]
condition = '.tags.component_type == "file" || .tags.component_type == "kafka" || starts_with!(.name, "vector")'
[sinks.internal_metrics_to_prom]
type = "prometheus_remote_write"
inputs = ["internal_metrics_filter"]
endpoint = "prometheus_endpoint"Key parameters: multiline.start_pattern defines the regex that starts a new log entry. multiline.condition_pattern merges subsequent lines that match the pattern into the previous entry. sinks.internal_metrics_to_prom exports Vector’s own metrics to Prometheus for dashboarding.
Log Rotation Practices
For Java applications, a typical Logback configuration limits the number of files to 7 and each file to 100 MiB, preventing pod disks from filling up:
<appender name="TEST" class="ch.qos.logback.core.rolling.RollingFileAppender">
<file>${user.home}/logs/test/test.log</file>
<rollingPolicy class="ch.qos.logback.core.rolling.FixedWindowRollingPolicy">
<fileNamePattern>${user.home}/logs/test/test.%i.log</fileNamePattern>
<minIndex>1</minIndex>
<maxIndex>7</maxIndex>
</rollingPolicy>
<triggeringPolicy class="ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy">
<maxFileSize>100MB</maxFileSize>
</triggeringPolicy>
<encoder class="ch.qos.logback.classic.encoder.PatternLayoutEncoder">
<charset>UTF-8</charset>
<pattern>%d{yyyy-MM-dd HH:mm:ss}|%msg%n</pattern>
</encoder>
</appender>Log4j can use a similar rolling strategy with either create (rename‑then‑create) or copytruncate (copy then truncate) modes. The article includes a diagram of the create‑mode workflow.
Common Issues and Mitigations
Pod crashes and recreation can cause log loss; mounting NAS preserves logs across restarts.
Excessively fast rotation (e.g., every second) may outpace collectors.
Collector throughput lower than log generation leads to backlog and potential loss.
Address these by reducing unnecessary log verbosity, tuning rotation policies, and selecting NAS for strict durability requirements.
Conclusion
SAE offers three main log‑collection options:
SLS – broad compatibility, suitable for most workloads.
NAS – guarantees no loss, ideal for highly critical logs.
Kafka (via Vector) – complements SLS when further processing or permission constraints exist.
Choosing the appropriate method depends on data retention needs, security policies, and whether downstream log enrichment is required.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
