Design and Implementation of a Distributed Log Service: Tianyan vs ELK
This article examines the challenges of building a high‑performance log service for distributed systems, compares the traditional ELK stack with the Tianyan platform, details Tianyan's architecture—including ingest, storage, and consumer components, SDK and Minos collection methods, high‑throughput transmission with Disruptor and Bigpipe, log retrieval, resource isolation, dynamic cleaning, and best‑practice recommendations.
The article begins by outlining the major challenges of log services in distributed environments, such as massive log volume, diverse formats, and the need for scalable, reliable collection and storage.
It then reviews the common ELK solution and highlights the differences of the Tianyan log platform, which offers easier integration, customizable resources, and better scalability.
2. ELK Common Solution and Tianyan Architecture
Section 2.1 introduces the Elastic Stack components (Ingest, Shippers, Queues, Processors) and tools like Elastic Agent, Fleet, APM, Beats, Logstash, and Elasticsearch, illustrating their roles with diagrams.
Section 2.2 details the Store component, emphasizing Elasticsearch as the core storage engine.
3. Tianyan Log Service
Section 3.1 describes the overall system architecture, showing how logs are collected, transmitted, stored, and isolated per product line.
Section 3.2 focuses on log collection methods: the SDK (Java Appender) and Minos (Baidu's streaming log transport).
public class LogClientAppender
extends AppenderBase
{
private static final Logger LOGGER = LoggerFactory.getLogger(LogClientAppender.class);
@Override
protected void append(E eventObject) {
ILoggingEvent event = filter(eventObject);
if (event != null) {
MessageLogSender.getExecutor().submit(new LogbackTask(event, LogNodeFactory.getLogNodeSyncDto()));
}
}
}The SDK supports Log4j, Logback, Log4j2, and forwards log events to a high‑performance Disruptor queue.
TraceFactory.getSqltracer().end(returnObj, className, methodName, realParams, dbType, sqlType, sql, sqlUrl)For MyBatis tracing, the interceptor is registered as:
sqlSessionFactory.getConfiguration().addInterceptor(new IlogMybatisPlugin());Section 3.3 explains the high‑concurrency transmission pipeline: logs are first placed into a Disruptor ring buffer, then into a Bigpipe offline queue, with a fallback BigQueue for rare failures.
Section 3.4 describes log retrieval via Kibana and Elasticsearch, supporting various query types (text, term, phrase, prefix, logical) with examples of DSL queries.
{
"query": {
"bool": {
"must": [{
"multi_match": {
"query": "searchValue",
"fields": ["message", "exception"],
"type": "best_fields"
}
}]
}
}
}5. Resource Isolation
Tianyan isolates transmission and storage resources per product line to avoid contention, describing a five‑step workflow from log generation to ES storage.
6. Dynamic Cleaning and Storage Downgrade
The platform monitors ES cluster usage, automatically deletes the oldest indices when thresholds are exceeded, and periodically snapshots data to low‑cost object storage (BOS) for long‑term retention.
7. Best Practices
Practical guidance includes product‑line onboarding, log filtering rules (content, name, combined), and operational tips for maintaining high availability and performance.
Overall, the article provides a comprehensive technical guide to building, operating, and optimizing a distributed log service, contrasting traditional ELK with the Tianyan solution.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.