Backend Development 21 min read

Design and Implementation of Logan Real-Time Log System at Meituan

The article details Meituan’s end‑to‑end design and implementation of Logan, a high‑performance real‑time logging service for mobile apps, web, mini‑programs and IoT, covering background challenges, architecture layers, technology choices such as Flink and Elasticsearch, stability measures, deployment practices, achieved results and future plans.

Meituan Technology Team

Nov 3, 2022

Design and Implementation of Logan Real-Time Log System at Meituan

1 Background

1.1 Logan Overview

Logan is Meituan’s unified log service for terminals, supporting mobile apps, web, mini‑programs and IoT. It provides log collection, storage, upload, query and analysis, helping developers locate issues and improve troubleshooting efficiency. Logan is also an early open‑source front‑end logging system with high write performance, strong security and loss‑avoidance.

1.2 Logan Workflow

When a terminal needs to report logs, it actively uploads them via an HTTPS interface to the Logan receiving service, which stores the raw log files in an object‑storage platform. Developers can download, decrypt and parse the logs, then deliver them to the log storage system. The log platform supports filtering by device, log type, tag, process, keyword and time, and can visualize specific log types.

1.3 Why Real‑Time Logs?

The original "local storage + active upload" model meets basic logging needs, but as business complexity grows, several problems become prominent:

In some scenarios (e.g., Web and mini‑programs) users leave quickly, making it hard to obtain logs after an issue occurs, which can miss the optimal troubleshooting window.

Lack of real‑time analysis and alerting; users have repeatedly requested monitoring of abnormal logs and receiving alerts when they appear.

Absence of end‑to‑end tracing; logs are scattered across multiple systems, requiring manual correlation, and Meituan lacks a unified tracing solution.

To address these pain points, Logan real‑time logging was built to provide a unified, high‑performance real‑time log service.

1.4 What Is Logan Real‑Time Log?

Logan real‑time log is a solution for terminal scenarios (mobile apps, web, mini‑programs, IoT) that offers high scalability, high performance and high reliability. It includes capabilities for log collection, upload, processing, consumption, delivery, query and analysis.

2 Design and Implementation

2.1 Overall Architecture

The architecture is divided into five parts:

Collection side : collects, encrypts, compresses, aggregates and reports logs on the terminal.

Ingestion layer : provides log upload interfaces, receives data and forwards it to the processing layer.

Processing layer : decrypts, splits, processes and cleans log data.

Consumption layer : filters, formats and delivers log data.

Log platform : offers log query, analysis, business system configuration, statistics and alerting.

2.2 Collection Side

The collection SDK is designed for cross‑platform reuse. It has been deployed on WeChat, MMP, Web and MRN, with platform‑specific code isolated. Core modules include:

Configuration management : after initialization, it fetches and refreshes configurations such as upload rate limits, sampling rates and feature switches, supporting gray‑release of key configs.

Encryption : logs are encrypted with ECDH + AES. The Web version uses the browser’s native encryption API for high‑performance asynchronous encryption; other platforms use pure JavaScript implementations.

Storage management : online data shows that logs lost due to page closure account for up to 1 %; therefore a disk‑cache is added. Failed uploads are persisted locally and retried on the next app start.

Queue management : logs are grouped and aggregated before sending. If the queue grows too large, excess requests are dropped to prevent memory bloat in weak‑network or high‑traffic scenarios.

Initialization creates Logger, Encryptor and Storage instances, fetches configuration, checks for persisted failed logs and resumes upload. When the write‑log API is called, the raw log is encrypted and added to the current upload group, which is flushed to the upload queue based on time, size or navigation triggers.

2.3 Ingestion Layer

The ingestion layer must support public domain reporting, high concurrency, minute‑level latency and delivery to a Kafka topic. After comparison, Meituan’s unified log collection channel was chosen because it satisfies all requirements. The SDK reports logs to a dedicated public domain, the collection channel aggregates them and forwards them to the configured Kafka queue.

2.4 Processing Layer

Three solutions were evaluated: a traditional Java application, a Storm‑based architecture, and a Flink‑based architecture. The comparison (see Table 1) showed that while the Java approach offers maturity and flexibility, it falls short on scalability, fault tolerance and performance. Storm and Flink both provide good scalability and fault tolerance, but Flink delivers lower latency and higher throughput. Consequently, Flink was selected as the processing framework.

Flink is an industry‑leading stream processing engine with high throughput, low latency, high reliability and exactly‑once semantics. It provides strong support for event windows and is widely adopted as the preferred stream engine.

After ingestion, logs are sent to a summary Kafka topic, then processed by Flink jobs that parse, decrypt, split and custom‑process the data before distributing it downstream.

Metadata parsing : raw log data is parsed into JSON objects.

Content decryption : asymmetric key exchange derives a symmetric key for decryption.

Service‑dimension splitting : logs are routed to business‑specific topics based on the topic field.

Custom data processing : user‑defined templates transform data from service topics into custom topics.

2.5 Consumption Layer

While most users find Logan’s collection, processing and retrieval sufficient, higher‑level demands such as metric monitoring, end‑to‑end tracing and offline analysis have emerged. Logan standardizes logs and delivers them to a Kafka stream processing platform, offering generic data transformation capabilities for integration with third‑party systems.

Typical use cases include:

Network full‑link tracing : combine front‑end and back‑end logs to reconstruct end‑to‑end request flows.

Metric aggregation & alerting : treat the log stream as a metric source, forward it to a monitoring platform for alerts.

Offline data analysis : export logs to Hive for long‑term storage and batch analysis.

2.6 Log Platform

The platform provides multi‑dimensional search (user ID, custom tags, keywords) and uses Elasticsearch for underlying storage due to its low cost, high scalability and near‑real‑time capabilities. A generic interface layer allows plugging in other storage engines if needed.

Elasticsearch is a distributed open‑source search and analytics engine with low entry cost, high scalability and near‑real‑time performance, suitable for large‑scale full‑text retrieval such as log queries.

3 Stability Assurance

3.1 Core Monitoring

Key SLA metrics are defined to measure availability of the real‑time log system:

In addition, a full‑process monitoring dashboard tracks upload success rate, domain availability, domain QPS, job throughput and average aggregation count, with fallback alerts for critical metrics.

3.2 Blue‑Green Deployment

Real‑time jobs must be updated without causing data loss or latency spikes. Traditional deployment can leave a gap where no job is running, leading to backlog if the new job fails. Blue‑Green deployment runs two identical jobs in parallel; the new job is verified before traffic is switched. After adopting this scheme, deployment failures caused by insufficient resources or mis‑configuration were eliminated, and the average switch time stayed under one minute, preventing log‑consumption delays.

4 Achievements

By Q3 2022, Logan real‑time log had been integrated into more than twenty business systems, including Meituan mini‑programs, selected merchants and catering SaaS. Representative scenarios:

Core‑link restoration : a C‑end mini‑program uses Logan to record key and abnormal logs. After launch, average complaint‑location time dropped from 10 minutes to under 3 minutes.

Internal testing debugging : a front‑end project added extra debug logs during internal testing. Post‑launch, each issue investigation saved 10‑15 minutes of user‑log retrieval time and eliminated missing logs due to storage limits.

Log data analysis : a team analyzed request headers, parameters and responses across 300+ pages and 500+ APIs, discovering over 1 000 specification violations in the first month.

5 Future Plans

Logan real‑time log will continue to evolve:

Support more terminal types and add log cleaning, statistics, alerting and full‑link tracing features.

Target million‑level QPS and improve upload success rate to 99.9 %.

Enhance stability with rate‑limiting, circuit‑break mechanisms and comprehensive incident‑response plans.

6 Author

Hong Kun, Xu Bo, Chen Cheng, Shao Xing and others, all from Meituan Basic Technology – Front‑end Technology Center.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Flink Elasticsearch kafka Blue-Green Deployment Real-time logging Logan

Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.