Tag

log-processing

0 views collected around this technical thread.

Lobster Programming
Lobster Programming
Jan 16, 2025 · Big Data

How to Extract Top 100 Search Keywords from Billion‑Scale Logs Efficiently

This article explains a divide‑and‑conquer method that splits massive search‑log files, uses multithreaded hashing to count keyword frequencies, and applies a min‑heap to efficiently retrieve the top‑100 most frequent search terms for SEO and recommendation tasks.

Big DataHashingMultithreading
0 likes · 3 min read
How to Extract Top 100 Search Keywords from Billion‑Scale Logs Efficiently
Tencent Advertising Technology
Tencent Advertising Technology
Dec 27, 2022 · Big Data

Design and Optimization of Tencent Advertising Log Data Lake Using Iceberg, Spark, and Flink

The article details how Tencent Advertising re‑architected its massive log pipeline by consolidating heterogeneous real‑time and offline logs into an Iceberg‑based data lake, introducing multi‑level partitioning, Spark and Flink ingestion, and numerous performance and cost optimizations for scalable big‑data analytics.

Big DataFlinkIceberg
0 likes · 20 min read
Design and Optimization of Tencent Advertising Log Data Lake Using Iceberg, Spark, and Flink
DataFunSummit
DataFunSummit
May 21, 2022 · Big Data

Tencent News Massive Log Processing Architecture and Data Applications

The article presents Tencent News' comprehensive massive log processing solution, covering background, overall architecture, data collection, real-time and offline computation layers, data quality assurance, and practical examples such as Flink CDC for database synchronization, illustrating how large‑scale data is managed and applied.

Big DataFlinkTencent
0 likes · 10 min read
Tencent News Massive Log Processing Architecture and Data Applications
Architecture Digest
Architecture Digest
Jun 10, 2021 · Big Data

NetEase Game Streaming ETL Architecture and Practices Based on Flink

This article presents NetEase Game's streaming ETL solution built on Flink, covering business background, log characteristics, specialized and generic ETL services, architectural evolution, Python UDF integration, runtime optimizations, fault‑tolerance mechanisms, and future roadmap for unified real‑time and offline data warehouses.

Big DataFlinkPython UDF
0 likes · 19 min read
NetEase Game Streaming ETL Architecture and Practices Based on Flink
IT Architects Alliance
IT Architects Alliance
Apr 20, 2021 · Big Data

Real-time Log Processing System Based on Flink and Drools

This article describes a real-time log processing platform that integrates Kafka, Flink, Drools rule engine, Redis, and Elasticsearch to unify heterogeneous log formats, extract business metrics, and provide configurable, dynamic data processing for large‑scale logging scenarios.

ElasticsearchFlinkKafka
0 likes · 6 min read
Real-time Log Processing System Based on Flink and Drools
Ctrip Technology
Ctrip Technology
Sep 10, 2020 · Big Data

Design and Implementation of a Unified Log Framework for Ctrip Payment Center

The article describes the design, architecture, and operational details of a unified logging framework at Ctrip's payment center, covering log production via a Log4j2 extension, Kafka‑Camus collection, Hive/ORC storage, MapReduce parsing optimizations, and governance strategies for massive daily TB‑scale data.

Big DataHadoopMapReduce
0 likes · 15 min read
Design and Implementation of a Unified Log Framework for Ctrip Payment Center
Ctrip Technology
Ctrip Technology
Jan 22, 2020 · Databases

Migrating Log Processing from Elasticsearch to ClickHouse: Architecture, Deployment, Optimization, and Benefits

This article details Ctrip's migration of large‑scale log processing from Elasticsearch to ClickHouse, explaining why ClickHouse was chosen, the high‑availability deployment architecture, data ingestion strategies, dashboard integration, performance gains, operational practices, and overall cost and reliability improvements.

ClickHouseDistributed SystemsElasticsearch
0 likes · 12 min read
Migrating Log Processing from Elasticsearch to ClickHouse: Architecture, Deployment, Optimization, and Benefits
Architecture Digest
Architecture Digest
Sep 24, 2019 · Big Data

Implementation Principles and Architecture of DBus Data Sources (RDBMS and Log Types)

The article explains how DBus ingests data from relational databases and log sources by detailing its extractor, incremental conversion, and full‑pull modules, the use of Canal and Kafka, rule‑based log structuring, the unified UMS message format, and heartbeat monitoring for reliability.

CanalDBusData ingestion
0 likes · 13 min read
Implementation Principles and Architecture of DBus Data Sources (RDBMS and Log Types)
Youzan Coder
Youzan Coder
Aug 14, 2019 · Big Data

Comprehensive Guide to Data Collection, Event Modeling, and Tracking in Big Data Platforms

The guide explains how comprehensive data collection in big‑data platforms relies on a standardized event model, passive and code‑based embedding, multi‑platform SDKs, a log‑middleware layer, precise location tracking, and an embedding management platform that supports workflow, testing, quality monitoring, and scalable infrastructure for future enhancements.

Big DataSDKanalytics
0 likes · 19 min read
Comprehensive Guide to Data Collection, Event Modeling, and Tracking in Big Data Platforms
NetEase Game Operations Platform
NetEase Game Operations Platform
Aug 4, 2019 · Big Data

Log Classification and Real-Time Aggregation Architecture Using Flink and Kafka

This article describes a real‑time log‑classification pipeline built on Flink and Kafka that pre‑filters, structures, classifies, and aggregates heterogeneous logs, enabling efficient frequency‑based alerts and statistical analysis without storing raw log data at scale.

AggregationFlinkKafka
0 likes · 11 min read
Log Classification and Real-Time Aggregation Architecture Using Flink and Kafka
Java Captain
Java Captain
Jun 29, 2018 · Backend Development

Introduction to Message Queue Middleware and Its Application Scenarios

This article introduces message queue middleware, explains its role in distributed systems for asynchronous processing, system decoupling, traffic shaping, log handling and message communication, and provides concrete e‑commerce and log‑collection examples illustrating how queues improve performance, scalability and reliability.

Backend DevelopmentMessage QueueSystem Decoupling
0 likes · 8 min read
Introduction to Message Queue Middleware and Its Application Scenarios
Efficient Ops
Efficient Ops
Dec 5, 2017 · Operations

How Alibaba’s Sunfire Achieves Second‑Level Monitoring at Trillion‑Transaction Scale

This article explains how Alibaba’s Sunfire monitoring platform processes terabytes of logs per minute, uses a pull‑based architecture with Brain‑Reduce‑Map roles, tackles scalability and reliability challenges, and outlines future directions such as MQL standardization and intelligent baselines.

Large ScaleMonitoringOperations
0 likes · 17 min read
How Alibaba’s Sunfire Achieves Second‑Level Monitoring at Trillion‑Transaction Scale
Tongcheng Travel Technology Center
Tongcheng Travel Technology Center
Apr 10, 2017 · Operations

Sentinel Monitoring System: Real‑Time Business Log Monitoring and Incident Detection for an Airline Ticket Platform

The Sentinel system was built to provide real‑time, zero‑modification monitoring of airline ticket business services by consuming Tianwang logs through a Storm cluster, offering flexible rule configuration, addressing performance pitfalls, and planning future enhancements such as custom monitoring scripts and visual dashboards.

KafkaMonitoringOperations
0 likes · 6 min read
Sentinel Monitoring System: Real‑Time Business Log Monitoring and Incident Detection for an Airline Ticket Platform
360 Quality & Efficiency
360 Quality & Efficiency
Mar 17, 2017 · Backend Development

Using jq for JSON Log Extraction and Real‑Time Monitoring on the Command Line

This tutorial introduces the jq command‑line tool, shows how to download and install it, and demonstrates practical commands for extracting specific JSON fields from log data, chaining jq pipelines, and monitoring click and PV metrics in real time.

Command LineJSONjq
0 likes · 6 min read
Using jq for JSON Log Extraction and Real‑Time Monitoring on the Command Line
Architecture Digest
Architecture Digest
Sep 21, 2016 · Big Data

Log Platform Architecture and Scaling Lessons from Vipshop's 419 Promotion

This article presents a detailed case study of Vipshop's log platform during the 419 sales event, analyzing the 2013 architecture, bottlenecks in RabbitMQ and Storm, and the subsequent redesign using Kafka, Impala, and HBase to achieve scalable, reliable big‑data processing.

Big DataImpalaKafka
0 likes · 16 min read
Log Platform Architecture and Scaling Lessons from Vipshop's 419 Promotion
Architecture Digest
Architecture Digest
Sep 14, 2016 · Backend Development

Log Platform Architecture and Scaling Lessons from Vipshop’s 419 Flash Sale

The article analyzes Vipshop’s 419 flash‑sale log platform, detailing the 2013 architecture using Flume, RabbitMQ, Storm, Redis and MySQL, diagnosing bottlenecks in RabbitMQ and Storm during traffic spikes, and presenting practical scaling and monitoring solutions for high‑throughput backend systems.

RabbitMQRedisbackend
0 likes · 8 min read
Log Platform Architecture and Scaling Lessons from Vipshop’s 419 Flash Sale