Operations 7 min read

Evolution of Tongcheng Log System Architecture

The article chronicles the development of Tongcheng's centralized log system from early file‑based logging through a MongoDB‑based solution to the current multi‑layer architecture using Flume, Elasticsearch, and Hadoop, highlighting design decisions, challenges, and future improvement plans.

Tongcheng Travel Technology Center

Mar 24, 2017

Evolution of Tongcheng Log System Architecture

With the rapid growth of company services, the number of applications increased dramatically, making centralized log collection, storage, and querying essential for quick issue diagnosis. A qualified log system must provide high availability, reliability, and scalability.

1. Background Introduction

As the business expanded, developers and operations needed a unified way to collect and analyze logs generated at runtime. Repeating effort across projects prompted the creation of a centralized logging platform.

2. Architecture Design

The architecture evolved through three major phases:

Phase 1 (pre‑2012 – "Stone Age"): Logs were stored locally as plain files; accessing them was time‑consuming and error‑prone.

Phase 2 (2012 – first unified log system): The team introduced a centralized solution using MongoDB as the backend store. Although MongoDB performed well at tens of millions of records, scaling to billions caused instability and data‑balancing issues, revealing the need for deeper expertise in the chosen technology.

Phase 3 (2014 – Tianwang Log Component V1): A completely redesigned four‑layer architecture was released:

Client layer – lightweight agents that monitor files without requiring application code changes.

Collection layer – Apache Flume, customized to write ORC files to Hadoop and optionally forward events to an internal MQ.

Storage layer – Elasticsearch for real‑time queries and Hadoop for long‑term, massive‑scale storage, with routing and hot‑cold index separation.

Query layer – web UI and REST API for interactive and programmatic access.

The client agents now operate entirely on the Linux side, listening to log files and parsing them according to flexible, user‑defined rules (see configuration screenshot).

Flume ensures reliable delivery via transactional semantics and supports failover and load‑balancing sink groups. Custom sinks enable direct ORC writes to Hadoop and optional MQ forwarding for offline analysis.

Elasticsearch provides fast search and alerting capabilities, with routing and index lifecycle management to handle growing data volumes.

Hadoop serves as the durable, scalable storage for full‑history logs, allowing linear expansion as needed.

3. Future Plans

After several iterations, the Tianwang log system now reliably supports the company's peak loads, yet continuous improvement remains a priority. Planned enhancements for the second half of 2016 include cross‑platform file collection, data‑center awareness and disaster recovery, Docker‑based storage with auto‑scaling, and other features to further strengthen the platform.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Flume log system

Written by

Tongcheng Travel Technology Center

Pursue excellence, start again with Tongcheng! More technical insights to help you along your journey and make development enjoyable.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.