Tagged articles
234 articles
Page 2 of 3
IT Architects Alliance
IT Architects Alliance
Jun 5, 2022 · Big Data

Real-Time Data and User Profiling Practices at Zhihu: Architecture, Challenges, and Solutions

This article presents a comprehensive case study of Zhihu's data empowerment team, detailing the design of a real‑time data platform and user profiling system, the challenges faced in scalability, latency, and data quality, and the practical solutions and architectural choices implemented to drive business value.

Data QualityLambda architecturedata pipeline
0 likes · 22 min read
Real-Time Data and User Profiling Practices at Zhihu: Architecture, Challenges, and Solutions
DataFunSummit
DataFunSummit
May 21, 2022 · Big Data

Tencent News Massive Log Processing Architecture and Data Applications

The article presents Tencent News' comprehensive massive log processing solution, covering background, overall architecture, data collection, real-time and offline computation layers, data quality assurance, and practical examples such as Flink CDC for database synchronization, illustrating how large‑scale data is managed and applied.

FlinkLog ProcessingTencent
0 likes · 10 min read
Tencent News Massive Log Processing Architecture and Data Applications
High Availability Architecture
High Availability Architecture
Apr 11, 2022 · Big Data

Ensuring Data Accuracy and Reliability in Baidu Log Platform: Architecture, Challenges, and Solutions

This article introduces the current state of Baidu's log platform, explains its lifecycle from data collection to downstream applications, analyzes the challenges of achieving near‑zero duplication and loss, and presents architectural optimizations and best‑practice recommendations to improve data stability and accuracy across the system.

Big DataData ReliabilitySystem Architecture
0 likes · 19 min read
Ensuring Data Accuracy and Reliability in Baidu Log Platform: Architecture, Challenges, and Solutions
DataFunSummit
DataFunSummit
Apr 4, 2022 · Big Data

User Portrait Scenarios and Technical Implementation Solutions

This article presents a comprehensive overview of user portrait applications across various industries, detailing common scenarios, product functionalities, and a step‑by‑step technical solution that includes data collection, tag management, ETL pipelines, and service architecture for real‑time and offline processing.

ETLSCRMTag Management
0 likes · 18 min read
User Portrait Scenarios and Technical Implementation Solutions
DataFunTalk
DataFunTalk
Mar 31, 2022 · Artificial Intelligence

Comprehensive Guide to TensorFlow: Modeling, Deployment, and Operations

This article provides an in‑depth overview of the TensorFlow ecosystem, covering Keras modeling productivity tools, classic model examples, AutoKeras and KerasTuner for automated search, data preprocessing pipelines, performance profiling, model optimization, and multiple deployment strategies for servers, browsers, and edge devices.

AutoMLKerasModel Deployment
0 likes · 20 min read
Comprehensive Guide to TensorFlow: Modeling, Deployment, and Operations
DeWu Technology
DeWu Technology
Mar 21, 2022 · Big Data

Real-time Customer Service Dashboard: Architecture and Implementation with Flink and ClickHouse

The article describes a real‑time customer‑service dashboard built on Flink for streaming MySQL changes captured via Kafka, which cleans and aggregates ~60 operational metrics before writing them to ClickHouse’s MergeTree/ReplacingMergeTree tables, enabling sub‑second queries and exactly‑once guarantees while separating offline and live pipelines.

ClickHouseDashboardFlink
0 likes · 18 min read
Real-time Customer Service Dashboard: Architecture and Implementation with Flink and ClickHouse
dbaplus Community
dbaplus Community
Feb 23, 2022 · Big Data

Inside OPPO’s Real‑Time Computing Platform: Architecture, Practices, and Future Roadmap

This article details OPPO’s real‑time computing platform, covering its business scope, big‑data architecture built on Flink, Spark and Trino, the end‑to‑end job development lifecycle, SQL IDE features, diagnostic and monitoring mechanisms, link latency tracking, SLA guarantees, practical use cases, and upcoming lakehouse and cloud‑native evolution.

FlinkReal‑Time Computingbig data platform
0 likes · 23 min read
Inside OPPO’s Real‑Time Computing Platform: Architecture, Practices, and Future Roadmap
IT Architects Alliance
IT Architects Alliance
Feb 8, 2022 · Backend Development

Designing a Daily Million-Transaction Payment Reconciliation System

This article explains how to architect a payment reconciliation system that can reliably process tens of millions of transactions per day, covering the underlying logic, scalability challenges, data collection methods, big‑data integration, and step‑by‑step processing flows to ensure accurate financial matching.

Backend ArchitectureBig DataHive
0 likes · 32 min read
Designing a Daily Million-Transaction Payment Reconciliation System
Code DAO
Code DAO
Dec 20, 2021 · Artificial Intelligence

Building Efficient Data Pipelines with TensorFlow’s tf.data API

This article explains how to use TensorFlow’s tf.data API to construct high‑performance, flexible data pipelines—from loading images or tensors, applying transformations and data augmentation, to batching, shuffling, caching, prefetching, and feeding the pipeline directly into model.fit for training.

PythonTensorFlowdata loading
0 likes · 9 min read
Building Efficient Data Pipelines with TensorFlow’s tf.data API
Code DAO
Code DAO
Dec 12, 2021 · Artificial Intelligence

Lightning Flash 0.3 Introduces New Tasks, Visualization Tools, Data Pipelines, and Registry API

Lightning Flash 0.3 expands the PyTorch Lightning ecosystem with eight new computer‑vision and NLP tasks, modular API design, integrated model hubs, visualisation callbacks, customizable data‑source hooks, and a central registry for model backbones, all illustrated with concrete code examples.

Computer VisionDeep LearningLightning Flash
0 likes · 7 min read
Lightning Flash 0.3 Introduces New Tasks, Visualization Tools, Data Pipelines, and Registry API
DataFunSummit
DataFunSummit
Dec 6, 2021 · Big Data

Design and Performance Optimization of a Real‑Time Billion‑Scale Data Processing Pipeline

This article reviews the background, architecture, and a series of performance‑optimizing techniques—including consumption, batch, storage, and execution‑engine tweaks—applied to a real‑time pipeline that processes hundreds of billions of records daily, and presents the resulting resource savings and latency improvements.

KafkaPerformance OptimizationReal-time Processing
0 likes · 9 min read
Design and Performance Optimization of a Real‑Time Billion‑Scale Data Processing Pipeline
Baidu Intelligent Testing
Baidu Intelligent Testing
Oct 12, 2021 · Artificial Intelligence

Full‑Link Consistency Testing for Click‑Through Rate Models in Large‑Scale Machine Learning

The article describes a comprehensive full‑link consistency testing framework for click‑through‑rate models, defining consistency issues, outlining data and logic consistency goals, and presenting a multi‑stage technical solution—including online data capture, offline data stitching, q‑value comparison, and reporting—to ensure model stability and performance.

DNNclick-through ratedata pipeline
0 likes · 18 min read
Full‑Link Consistency Testing for Click‑Through Rate Models in Large‑Scale Machine Learning
dbaplus Community
dbaplus Community
Sep 6, 2021 · Frontend Development

Building a Scalable Frontend Performance Monitoring System at 哈啰

This article details 哈啰's front‑end performance monitoring architecture, covering the background of rapid growth, a three‑step optimization workflow, data collection, cleaning, aggregation, visualization, and practical techniques like pre‑rendering and offline packages to dramatically improve page load metrics.

Metricsdata pipelinefrontend
0 likes · 30 min read
Building a Scalable Frontend Performance Monitoring System at 哈啰
Xianyu Technology
Xianyu Technology
Aug 31, 2021 · Big Data

Xianyu SPU System Architecture and Data Pipeline Overview

Xianyu built a custom SPU system and data pipeline that cleans Alibaba’s raw SPU data, defines key, binding, sales and product attributes, stores enriched records in MySQL, syncs to OpenSearch, and supports diverse business scenarios such as inspection, search publishing, and worry‑free purchase.

OpenSearchProduct ModelingSPU
0 likes · 8 min read
Xianyu SPU System Architecture and Data Pipeline Overview
DataFunTalk
DataFunTalk
Jul 26, 2021 · Big Data

Accelerating Hive Daily Tables with Flink: A SmartNews Case Study

This article describes how SmartNews integrated Flink into its Airflow‑driven Hive batch pipeline to cut the actions table generation latency from three hours to about thirty‑four minutes, detailing the technical challenges, design decisions, and production results.

AWSBig DataFlink
0 likes · 12 min read
Accelerating Hive Daily Tables with Flink: A SmartNews Case Study
Didi Tech
Didi Tech
Jul 1, 2021 · Big Data

Full-Chain Traffic Data Detection in DiDi's Omega Platform

DiDi’s Omega platform provides an end‑to‑end traffic‑data pipeline—from SDK collection through real‑time and offline ETL to storage and analysis—augmented by a detection service that measures loss, duplication and accuracy, achieving sub‑1% SDK loss, integrity tagging, comprehensive monitoring dashboards, and includes a senior data‑engineer hiring call.

Data QualityOmega Platformdata pipeline
0 likes · 9 min read
Full-Chain Traffic Data Detection in DiDi's Omega Platform
Java High-Performance Architecture
Java High-Performance Architecture
Jun 14, 2021 · Big Data

How NetEase Games Built a Scalable Flink‑Based Streaming ETL Platform

This article explains how NetEase Games engineers designed and operated a Flink‑driven streaming ETL system, covering business background, log classification, dedicated and generic ETL services, architecture evolution, Python UDF integration, runtime optimizations, tuning practices, fault‑tolerance mechanisms, and future roadmap.

FlinkGame Analyticsdata pipeline
0 likes · 22 min read
How NetEase Games Built a Scalable Flink‑Based Streaming ETL Platform
Architecture Digest
Architecture Digest
Jun 10, 2021 · Big Data

NetEase Game Streaming ETL Architecture and Practices Based on Flink

This article presents NetEase Game's streaming ETL solution built on Flink, covering business background, log characteristics, specialized and generic ETL services, architectural evolution, Python UDF integration, runtime optimizations, fault‑tolerance mechanisms, and future roadmap for unified real‑time and offline data warehouses.

Big DataFlinkLog Processing
0 likes · 19 min read
NetEase Game Streaming ETL Architecture and Practices Based on Flink
IT Architects Alliance
IT Architects Alliance
Jun 8, 2021 · Industry Insights

Inside Toutiao’s 11B Daily‑Active‑User Architecture: Data, Recommendations & Scaling

This article dissects Toutiao’s rapid growth from a small startup to a platform with over 5 billion registered users, detailing its data collection pipeline, user‑modeling techniques, recommendation engine, micro‑service architecture, PaaS infrastructure, storage strategies, and push‑notification system.

Recommendation EngineToutiaodata pipeline
0 likes · 9 min read
Inside Toutiao’s 11B Daily‑Active‑User Architecture: Data, Recommendations & Scaling
Xianyu Technology
Xianyu Technology
Jun 8, 2021 · Big Data

Longgong Data Analysis Platform: Architecture and Solutions for Large‑Scale Structured Data

The Longgong Data Analysis Platform enables Idle Fish to capture, store, and analyze billions of structured product attributes in real time across more than 8,000 categories, using TableStore, MySQL, ODPS, and a distributed scheduler to achieve over 50% query speedup, 80% category coverage, and rapid support for search and recommendation teams.

AlibabaBig DataData Platform
0 likes · 9 min read
Longgong Data Analysis Platform: Architecture and Solutions for Large‑Scale Structured Data
Architecture Digest
Architecture Digest
May 17, 2021 · Big Data

Technical Architecture Overview of Toutiao: Data Pipeline, User Modeling, Recommendation System, and Microservices

The article provides a comprehensive technical overview of Toutiao's rapid growth, detailing its massive user base, data collection and processing pipelines, user modeling, cold‑start strategies, recommendation engines, storage solutions, push notification mechanisms, and the underlying microservice and PaaS architecture.

Big DataHadoopKafka
0 likes · 8 min read
Technical Architecture Overview of Toutiao: Data Pipeline, User Modeling, Recommendation System, and Microservices
DataFunTalk
DataFunTalk
May 14, 2021 · Big Data

Real‑time Billion‑Scale Data Transmission and AI Pipeline Architecture at Bilibili

This article presents a technical deep‑dive into Bilibili’s evolution from offline to real‑time data processing, describing the challenges of timeliness, ETL, AI feature engineering, and the design of a Flink‑on‑YARN incremental pipeline that supports trillion‑scale message throughput and AI‑driven real‑time applications.

AIBig DataFlink
0 likes · 27 min read
Real‑time Billion‑Scale Data Transmission and AI Pipeline Architecture at Bilibili
HelloTech
HelloTech
May 14, 2021 · Big Data

User Behavior Analysis System: Architecture, ClickHouse Cluster Deployment, and Analytical Techniques

The article describes a real‑time user behavior analysis platform built on a ClickHouse cluster, detailing its architecture, Hive‑to‑ClickHouse data ingestion with user‑ID routing, table designs for behavior and group data, and five analytical methods—event, funnel, path, retention, and attribution—leveraging shard‑level parallelism and custom functions for high efficiency.

AnalyticsBig DataClickHouse
0 likes · 20 min read
User Behavior Analysis System: Architecture, ClickHouse Cluster Deployment, and Analytical Techniques
DeWu Technology
DeWu Technology
May 7, 2021 · Big Data

Unified Semantic Layer for Data Development: Addressing Pain Points and Optimizing Queries

A unified semantic layer for data development creates a consistent, multi‑view representation of metrics that buffers logical changes, lets downstream applications use metric names only, and enables analysts and developers to select optimal query objects, thereby reducing misunderstandings, cutting rework, and improving query performance and maintainability.

OLAPdata pipeline
0 likes · 5 min read
Unified Semantic Layer for Data Development: Addressing Pain Points and Optimizing Queries
IT Architects Alliance
IT Architects Alliance
Apr 23, 2021 · Industry Insights

Inside Toutiao’s Massive Scale: How the News App Handles Billions of Requests

This article provides an in‑depth technical overview of Toutiao’s rapid growth, data collection pipelines, user modeling, cold‑start strategies, recommendation engine architecture, storage solutions, push notification system, microservice design, and its three‑layer PaaS platform, illustrating how the news app serves hundreds of millions of users daily.

Big DataSystem ArchitectureToutiao
0 likes · 8 min read
Inside Toutiao’s Massive Scale: How the News App Handles Billions of Requests
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Apr 22, 2021 · Big Data

Inside Toutiao’s Massive Big Data & Recommendation Architecture

This article examines Toutiao’s rapid growth from a small startup to a platform serving over 500 million users, detailing its data collection, user modeling, cold‑start handling, recommendation engines, storage solutions, messaging push system, micro‑service design, and virtualized PaaS infrastructure that enable high‑throughput, personalized news delivery.

Microservicescloud computingdata pipeline
0 likes · 9 min read
Inside Toutiao’s Massive Big Data & Recommendation Architecture
Xianyu Technology
Xianyu Technology
Apr 22, 2021 · Big Data

Real-time Performance Optimization of the Mahé Selection and Delivery System

By classifying data streams, aggregating large‑scale T+1 records in six‑hour windows, encoding attributes with multi‑value mappings, storing compressed rule‑hit backups, and synchronizing recall tables in real time, Mahé’s selection‑and‑delivery pipeline cut end‑to‑end latency from minutes to seconds, achieving robust second‑level responsiveness.

Big DataPerformance OptimizationReal-Time
0 likes · 12 min read
Real-time Performance Optimization of the Mahé Selection and Delivery System
TAL Education Technology
TAL Education Technology
Apr 15, 2021 · Big Data

Global Feature Pool Architecture and Workflow for Data‑Driven Growth

The article describes a unified global feature pool architecture that standardizes offline and real‑time feature production, management, and service layers using Hive, Spark, Flink, Kafka, MySQL, and Hologres to break data silos, improve algorithm development efficiency, and boost growth business performance.

Data Platformdata pipelinefeature engineering
0 likes · 7 min read
Global Feature Pool Architecture and Workflow for Data‑Driven Growth
Top Architect
Top Architect
Apr 9, 2021 · Big Data

Technical Architecture and Data Processing of Toutiao News Feed System

This article provides a comprehensive overview of Toutiao's rapid growth, massive user base, data collection pipelines, user modeling, recommendation engine, storage solutions, message push strategies, micro‑service architecture, and virtualization PaaS platform, illustrating how big‑data technologies enable personalized news delivery at scale.

Big DataMicroservicesToutiao
0 likes · 8 min read
Technical Architecture and Data Processing of Toutiao News Feed System
Meituan Technology Team
Meituan Technology Team
Mar 4, 2021 · Artificial Intelligence

How Meituan Waimai Scaled Feature Engineering for Billions of Requests

This article details Meituan Waimai's evolution from a simple feature framework to a sophisticated, configurable platform that handles massive feature production, multi‑task scheduling, dynamic protobuf storage, and a model‑feature description language (MFDL) to enable efficient online retrieval, high‑performance computation, and consistent training‑sample generation for its recommendation, advertising, and search services.

MFDLMachine Learning PlatformMeituan
0 likes · 31 min read
How Meituan Waimai Scaled Feature Engineering for Billions of Requests
NetEase Smart Enterprise Tech+
NetEase Smart Enterprise Tech+
Feb 4, 2021 · Operations

How NetEase Cloud Communication Builds a Real-Time Service Monitoring Platform

NetEase Cloud Communication’s service monitoring platform leverages data collection, preprocessing, alerting, and visualization pipelines—using HTTP APIs, Kafka, custom scripts, and NTSDB—to provide real-time insights, ensure stability, and support scalable, high‑throughput audio‑video services.

Operationscloud communicationdata pipeline
0 likes · 11 min read
How NetEase Cloud Communication Builds a Real-Time Service Monitoring Platform
Top Architect
Top Architect
Jan 17, 2021 · Big Data

Migrating LinkedIn’s Who Viewed Your Profile System from Lambda Architecture to a Lambda‑less Architecture

This article describes how LinkedIn’s Who Viewed Your Profile feature was originally built on a Lambda architecture, the operational challenges it caused, and the step‑by‑step migration to a streamlined, Samza‑driven, Lambda‑less design that improves performance, reduces maintenance overhead, and retains essential batch capabilities.

Lambda architectureLinkedInPinot
0 likes · 11 min read
Migrating LinkedIn’s Who Viewed Your Profile System from Lambda Architecture to a Lambda‑less Architecture
Laiye Technology Team
Laiye Technology Team
Dec 18, 2020 · Big Data

Comprehensive Overview of Laiye Technology's Business Intelligence Ecosystem

This article provides a detailed, end‑to‑end description of Laiye Technology's BI ecosystem, covering its background, development stages, data acquisition, transmission, transformation, loading, modeling, storage layers, statistical analysis, real‑time metrics, visualization, and future challenges, illustrating how the company builds a scalable, cloud‑native data‑driven platform.

AnalyticsBIBig Data
0 likes · 22 min read
Comprehensive Overview of Laiye Technology's Business Intelligence Ecosystem
DataFunTalk
DataFunTalk
Nov 27, 2020 · Big Data

Evolution of Kafka‑Based Data Pipeline at Chehaoduo Group: Architecture, Scaling, and Best Practices

This article chronicles the four‑year evolution of Chehaoduo Group’s Kafka ecosystem—from its initial role as a simple data‑ingestion layer to becoming the core of the company’s large‑scale data pipeline—detailing cluster management, upgrade strategies, multi‑cluster deployment, AVRO schema handling, SDK development, and operational lessons learned.

AvroCluster ManagementKafka
0 likes · 21 min read
Evolution of Kafka‑Based Data Pipeline at Chehaoduo Group: Architecture, Scaling, and Best Practices
58 Tech
58 Tech
Nov 25, 2020 · Databases

Design and Implementation of a Financial Fraud Detection Graph Network Using JanusGraph

This article presents a comprehensive overview of building a financial fraud detection graph network, covering background challenges, graph schema design, a four‑layer architecture with JanusGraph, data import pipelines, quality assurance, performance optimizations, and practical applications such as risk scoring, association analysis, and id‑mapping.

JanusGraphRisk analysisdata pipeline
0 likes · 22 min read
Design and Implementation of a Financial Fraud Detection Graph Network Using JanusGraph
Xianyu Technology
Xianyu Technology
Nov 17, 2020 · Big Data

Xianyu Premium Product Library: Architecture and Implementation

Xianyu’s premium‑product library combines interpretable, multi‑dimensional metric models built from structured product and user attributes with real‑time and offline pipelines to systematically tag high‑quality items, delivering services via HSF and a message bus, and has driven over 20% click‑through growth and nearly doubled conversion rates.

Real-time Processingdata pipelinefeature engineering
0 likes · 7 min read
Xianyu Premium Product Library: Architecture and Implementation
Ctrip Technology
Ctrip Technology
Nov 12, 2020 · Artificial Intelligence

Ctrip Machine Translation Platform: Architecture, Data Construction, Algorithm Design, and Performance Optimization

This article presents a comprehensive overview of Ctrip's multilingual machine translation platform, detailing demand analysis, system architecture, data pipeline, algorithmic innovations such as task‑space fusion and term‑translation interventions, as well as extensive performance optimizations for low‑resource languages.

AICtripModel Optimization
0 likes · 20 min read
Ctrip Machine Translation Platform: Architecture, Data Construction, Algorithm Design, and Performance Optimization
System Architect Go
System Architect Go
Nov 1, 2020 · Big Data

Introduction to Logstash: Basics, Installation, Configuration, and Plugins

This article introduces Logstash as an open‑source data‑pipeline tool, explains why it simplifies data ingestion, filtering and output, walks through installation and a first‑pipeline example, and provides a comprehensive overview of its input, filter, and output plugins with configuration snippets.

ConfigurationELKLogstash
0 likes · 10 min read
Introduction to Logstash: Basics, Installation, Configuration, and Plugins
Tencent Cloud Developer
Tencent Cloud Developer
Oct 29, 2020 · Cloud Computing

Distributed Atmospheric Monitoring System – Cloud Architecture, Module Implementation, and Cost Analysis

The paper describes Tencent’s community‑driven distributed atmospheric monitoring platform, detailing its multi‑layer cloud architecture, data ingestion and aggregation modules built with API Gateway, Serverless Functions, MySQL, and Cloud Map, and compares Phase II and Phase III operational costs while outlining future enhancements.

Distributed MonitoringIoTServerless
0 likes · 11 min read
Distributed Atmospheric Monitoring System – Cloud Architecture, Module Implementation, and Cost Analysis
Xianyu Technology
Xianyu Technology
Oct 15, 2020 · Industry Insights

Cutting Data Dashboard Development Time from Days to Hours: Xianyu’s 3‑Layer Serverless Solution

Xianyu transformed its slow, manual data‑analysis workflow— plagued by BI bottlenecks, slow SQL, and cumbersome front‑end integration—into a three‑layer, serverless architecture that abstracts SQL into reusable atoms, automates data pipelines, and delivers smart, seconds‑level visual dashboards, slashing development effort from five days to half a day.

Data visualizationSQL abstractionServerless
0 likes · 12 min read
Cutting Data Dashboard Development Time from Days to Hours: Xianyu’s 3‑Layer Serverless Solution
ITPUB
ITPUB
Sep 14, 2020 · Big Data

How Alibaba’s DChain Data Converger Auto‑Generates Real‑Time Wide Tables with SQL Pipelines

This article explains how the ADC (Alibaba DChain Data Converger) project automatically creates large real‑time tables by letting users configure metrics on the front‑end, then generating and publishing SQL through a pipeline that leverages design patterns, priority queues, and tree‑based data structures for efficient cross‑database processing.

Design PatternsFlinkReal-time analytics
0 likes · 15 min read
How Alibaba’s DChain Data Converger Auto‑Generates Real‑Time Wide Tables with SQL Pipelines
Tencent Cloud Middleware
Tencent Cloud Middleware
Aug 12, 2020 · Big Data

How Serverless Functions Can Replace Traditional Kafka Data Pipelines for Lower Cost and Easier Scaling

This article explains how Tencent Cloud CKafka works, describes the challenges of traditional open‑source data‑flow solutions, and demonstrates a Serverless Function approach—complete with architecture diagrams and code examples—to achieve low‑cost, auto‑scaling Kafka‑to‑Elasticsearch pipelines.

Big DataCKafkaElasticsearch
0 likes · 12 min read
How Serverless Functions Can Replace Traditional Kafka Data Pipelines for Lower Cost and Easier Scaling
Efficient Ops
Efficient Ops
Jul 28, 2020 · Operations

How to Turn Ops Data into Business Value: A Practical Guide

This article explores the evolution and monetization of operations data, outlines a four‑stage management process—from data discovery to modeling, ingestion, and monetization—highlights key scenarios such as intelligent monitoring and root‑cause analysis, and offers practical recommendations for building an effective ops data platform.

AIData ManagementData Monetization
0 likes · 15 min read
How to Turn Ops Data into Business Value: A Practical Guide
WecTeam
WecTeam
Jul 23, 2020 · Backend Development

How We Reduced WebMonitor Latency from Minutes to Seconds – Architecture & Performance Secrets

This article chronicles the evolution of the WebMonitor front‑end monitoring system, detailing its three‑tier stack, data pipeline upgrades from raw disk sampling to HDFS and Elasticsearch, extensive collector‑side optimizations, Jetty thread and timeout tuning, and the resulting performance gains that lowered response times from minutes to sub‑second levels.

JavaJettydata pipeline
0 likes · 15 min read
How We Reduced WebMonitor Latency from Minutes to Seconds – Architecture & Performance Secrets
Ctrip Technology
Ctrip Technology
Jul 16, 2020 · Big Data

Design and Architecture of the User Profiling System at Ctrip Business Travel

This article describes the concept, tag taxonomy, data flow architecture, and Lambda‑based query service design of Ctrip Business Travel's user profiling system, highlighting how batch and real‑time processing with Spark, Flink, Hive, MongoDB and Redis enable precise marketing, risk control and personalized services.

Big DataCtripdata pipeline
0 likes · 12 min read
Design and Architecture of the User Profiling System at Ctrip Business Travel
58 Tech
58 Tech
Jul 10, 2020 · Artificial Intelligence

Tag Mining for Used‑Car Business: NLP, Word2Vec, and Retrieval Pipeline

This article details the end‑to‑end process of extracting and leveraging tags for used‑car listings, covering data collection, segmentation, NLP‑based tokenization, word‑vector generation, tag‑library construction, and online retrieval flow to improve personalized recall and CTR.

NLPTaggingWord2Vec
0 likes · 19 min read
Tag Mining for Used‑Car Business: NLP, Word2Vec, and Retrieval Pipeline
dbaplus Community
dbaplus Community
Jul 7, 2020 · Big Data

How Flink + ClickHouse Power Real‑Time Analytics at Scale

This article explains how FunTouTiao builds a high‑performance real‑time analytics pipeline using Flink, Hive, and ClickHouse, covering business scenarios, hour‑level and second‑level Flink‑to‑Hive architectures, streaming file sink mechanics, multi‑user permissions, ClickHouse performance tricks, and future roadmap for unified stream‑batch storage.

Big DataClickHouseFlink
0 likes · 18 min read
How Flink + ClickHouse Power Real‑Time Analytics at Scale
Ctrip Technology
Ctrip Technology
Jun 29, 2020 · Backend Development

Optimizing Ctrip’s Vacation Search Engine: From Search 1.0 to 5.5

This article details the evolution and optimization of Ctrip’s vacation search engine, covering business challenges, indexing redesign, data collection pipelines, write‑path improvements, compression techniques, query performance enhancements, deployment strategies, and the resulting gains in storage, latency, and stability.

BackendIndex OptimizationScalability
0 likes · 14 min read
Optimizing Ctrip’s Vacation Search Engine: From Search 1.0 to 5.5
DataFunTalk
DataFunTalk
Jun 18, 2020 · Big Data

Real-time Data Processing at QuTouTiao: Flink + ClickHouse Architecture and Practices

QuTouTiao leverages Flink and ClickHouse to build a high‑performance real‑time analytics platform that supports hourly Hive pipelines and sub‑second ClickHouse queries, achieving sub‑second response for 80% of requests through streaming ingestion, exactly‑once semantics, multi‑cluster coordination, and optimized ClickHouse storage and connector designs.

Big DataClickHouseFlink
0 likes · 16 min read
Real-time Data Processing at QuTouTiao: Flink + ClickHouse Architecture and Practices
DataFunTalk
DataFunTalk
Jun 14, 2020 · Big Data

Designing an Offline Big Data Processing Architecture Based on Object Storage

This article presents a comprehensive offline big‑data processing framework that leverages scalable object storage for PB‑level data, details storage and compute engine requirements, compares cost options, describes data pipeline design, and showcases an e‑commerce case study with Spark‑driven analytics.

Big DataCost OptimizationSpark
0 likes · 19 min read
Designing an Offline Big Data Processing Architecture Based on Object Storage
21CTO
21CTO
May 12, 2020 · Big Data

Inside Toutiao’s Massive Data Pipeline: Architecture, Recommendation & Scaling

This article details Toutiao’s rapid growth and its large‑scale data pipeline, covering article crawling, user modeling, recommendation engines, storage solutions, push notifications, micro‑service architecture, and the underlying virtualization PaaS platform that powers its personalized news service.

MicroservicesToutiaodata pipeline
0 likes · 8 min read
Inside Toutiao’s Massive Data Pipeline: Architecture, Recommendation & Scaling
Tencent Advertising Technology
Tencent Advertising Technology
May 2, 2020 · Artificial Intelligence

How to Use TI-ONE Built‑in Operators for the 2020 Tencent Advertising Algorithm Competition

This tutorial walks you through creating a TI‑ONE project, ingesting competition data, configuring and training a decision‑tree model with built‑in operators, running the workflow, and downloading and uploading the result files for the 2020 Tencent Advertising Algorithm Competition.

Model TrainingTI-ONEdata pipeline
0 likes · 7 min read
How to Use TI-ONE Built‑in Operators for the 2020 Tencent Advertising Algorithm Competition
ITPUB
ITPUB
Apr 12, 2020 · Big Data

Inside Toutiao’s Massive Data Pipeline and Real‑Time Recommendation Engine

This article details how Toutiao processes billions of daily page views, builds user models with Hadoop and Storm, runs real‑time recommendation and cold‑start personalization, and scales its microservice‑based architecture using Kafka, MySQL, MongoDB, Redis and a high‑throughput push system.

data pipelinerecommendation system
0 likes · 10 min read
Inside Toutiao’s Massive Data Pipeline and Real‑Time Recommendation Engine
dbaplus Community
dbaplus Community
Mar 3, 2020 · Big Data

How MaFengWo Scaled Kafka for Real‑Time Big Data: Lessons and Best Practices

This article details MaFengWo's practical experience with Kafka in its big‑data platform, covering three core usage scenarios, a four‑stage evolution roadmap—including version upgrades, resource isolation, security and monitoring—and future plans such as transaction‑based deduplication and consumer throttling.

Big DataKafkaResource Isolation
0 likes · 17 min read
How MaFengWo Scaled Kafka for Real‑Time Big Data: Lessons and Best Practices
Qunar Tech Salon
Qunar Tech Salon
Feb 21, 2020 · Artificial Intelligence

Building an End‑to‑End Data‑Model Loop for Alibaba XiaoMi AI Services

The article describes how Alibaba's XiaoMi AI platform constructs a closed‑loop pipeline—from data collection and annotation to model training, evaluation, and real‑time deployment—using multi‑dimensional data processing, visualization, and Spark‑based engines to accelerate iterative improvements and address operational pain points.

AIBig DataModel Training
0 likes · 9 min read
Building an End‑to‑End Data‑Model Loop for Alibaba XiaoMi AI Services
DataFunTalk
DataFunTalk
Feb 17, 2020 · Artificial Intelligence

Building a Closed‑Loop AI System: From Data Collection to Model Deployment in Alibaba’s XiaoMi

This article explains how Alibaba’s XiaoMi team constructs a full‑cycle AI pipeline—covering real‑time and offline data processing, high‑dimensional visualization, model training, iterative feedback, and Spark‑based deployment—to accelerate intelligent product iteration while addressing common engineering pain points.

AIBig DataReal-time Processing
0 likes · 10 min read
Building a Closed‑Loop AI System: From Data Collection to Model Deployment in Alibaba’s XiaoMi
dbaplus Community
dbaplus Community
Nov 21, 2019 · Databases

How to Build a Real‑Time MySQL Statistics Platform with ClickHouse

This article explains how a growing company designed, optimized, and deployed a comprehensive MySQL monitoring and analysis pipeline—moving from Flume‑HDFS‑Hive to ClickTail‑ClickHouse, enriching SQL parsing, and applying practical methods for state statistics, trend analysis, permission management, and data‑skew detection.

DBADatabase MonitoringSQL Analytics
0 likes · 16 min read
How to Build a Real‑Time MySQL Statistics Platform with ClickHouse
DataFunTalk
DataFunTalk
Nov 7, 2019 · Big Data

Real-Time Computing Engine at Beike: Architecture, Practices, and Future Plans

This article details Beike's real‑time computing engine, covering its background, streaming platform built on Spark Streaming and Flink, data ingestion via Kafka, metadata handling, SQL‑based task development, monitoring, storage solutions, and future roadmap for resource management and AI‑enhanced monitoring.

Big DataFlinkKafka
0 likes · 14 min read
Real-Time Computing Engine at Beike: Architecture, Practices, and Future Plans
Big Data Technology & Architecture
Big Data Technology & Architecture
Oct 30, 2019 · Big Data

Building a Real‑Time Data Processing Pipeline with Apache Kafka, Spark Streaming, and Cassandra

This tutorial explains how to create a highly scalable, fault‑tolerant real‑time data processing platform by configuring a Kafka topic, a Cassandra keyspace, adding Spark and connector dependencies, developing a Java‑based Spark Streaming pipeline, enabling checkpoints, and deploying the application with spark‑submit.

Big DataJavaKafka
0 likes · 8 min read
Building a Real‑Time Data Processing Pipeline with Apache Kafka, Spark Streaming, and Cassandra
Alibaba Cloud Developer
Alibaba Cloud Developer
Oct 30, 2019 · Big Data

How Real-Time Big Data Pipelines Detect E‑Commerce Ad Misplacements

This article explains how a large‑scale e‑commerce search advertising system uses real‑time big‑data pipelines, log synchronization, NoSQL storage, and proactive verification to automatically discover and correct ad placement errors across the entire data processing chain, protecting both advertisers and the platform.

Big Dataad verificationdata pipeline
0 likes · 13 min read
How Real-Time Big Data Pipelines Detect E‑Commerce Ad Misplacements
58 Tech
58 Tech
Sep 6, 2019 · Big Data

Architecture and Technical Implementation of the WMDA Data Analytics Platform

The article details WMDA's end‑to‑end data analytics architecture, covering zero‑event data collection, real‑time and offline processing pipelines built on Spark Streaming, Druid, Hadoop, Kettle, and TaskServer, and explains how these components collaborate to deliver comprehensive user behavior analysis.

Big DataDruidETL
0 likes · 11 min read
Architecture and Technical Implementation of the WMDA Data Analytics Platform
Xueersi Online School Tech Team
Xueersi Online School Tech Team
Sep 6, 2019 · Big Data

Real-Time Data Architecture, Evolution, and Applications at an Online School

The article details the six‑layer big‑data architecture of an online school, chronicles its migration from Storm to Spark Streaming and finally to Flink, and showcases concrete real‑time applications such as gateway monitoring, user‑profile tagging, renewal reporting, and advertising analysis, while outlining future development directions.

AnalyticsBig Data ArchitectureFlink
0 likes · 14 min read
Real-Time Data Architecture, Evolution, and Applications at an Online School
Ctrip Technology
Ctrip Technology
Sep 4, 2019 · Artificial Intelligence

Design and Implementation of Ctrip's User Precise Marketing System

This article details the design goals, architecture, core functionalities, and optimization strategies of Ctrip's user precise marketing system, which leverages RESTful integration, flexible rule-based and machine‑learning models, real‑time monitoring, and AB testing to improve traffic utilization and conversion rates.

AB testingCtripMarketing
0 likes · 11 min read
Design and Implementation of Ctrip's User Precise Marketing System
Xianyu Technology
Xianyu Technology
Aug 28, 2019 · Big Data

Unified Search System Architecture and Automation for Multiple Business Scenarios

To avoid building separate search services for each Xianyu business, the team created a unified, generic search architecture based on Alibaba’s HA3 engine and a control layer that automates data dumping, indexing, query translation, and result ranking across five subsystems, enabling new services to be onboarded in minutes instead of weeks.

AutomationBig Datadata pipeline
0 likes · 18 min read
Unified Search System Architecture and Automation for Multiple Business Scenarios
Youzan Coder
Youzan Coder
Aug 14, 2019 · Big Data

Comprehensive Guide to Data Collection, Event Modeling, and Tracking in Big Data Platforms

The guide explains how comprehensive data collection in big‑data platforms relies on a standardized event model, passive and code‑based embedding, multi‑platform SDKs, a log‑middleware layer, precise location tracking, and an embedding management platform that supports workflow, testing, quality monitoring, and scalable infrastructure for future enhancements.

AnalyticsBig DataLog Processing
0 likes · 19 min read
Comprehensive Guide to Data Collection, Event Modeling, and Tracking in Big Data Platforms
ITPUB
ITPUB
Jul 2, 2019 · Databases

How ClickHouse Powers Ctrip’s Hotel Data Platform for Billions of Daily Updates

This article explains how Ctrip’s hotel data intelligence platform handles over ten billion daily data updates and nearly a million queries by adopting ClickHouse, detailing the system's background, the reasons for choosing ClickHouse over other solutions, the data ingestion pipelines, monitoring strategies, operational practices, and performance outcomes.

Big DataClickHouseReal-time analytics
0 likes · 13 min read
How ClickHouse Powers Ctrip’s Hotel Data Platform for Billions of Daily Updates
Ctrip Technology
Ctrip Technology
Jun 26, 2019 · Databases

Applying ClickHouse for a High‑Performance Hotel Data Intelligence Platform

This article describes how Ctrip Hotel's data intelligence platform leverages ClickHouse to achieve real‑time analytics on billions of daily updates and millions of queries, detailing the system architecture, data ingestion pipelines, monitoring, and operational lessons learned for large‑scale, high‑availability data services.

Data WarehouseReal-time analyticsdata pipeline
0 likes · 12 min read
Applying ClickHouse for a High‑Performance Hotel Data Intelligence Platform
58 Tech
58 Tech
May 31, 2019 · Artificial Intelligence

Summary of 58 Group Technical Salon: Recommendation System Architecture and Search Ranking Algorithm Practices

The article summarizes the 58 Group technical salon where experts presented the microservice‑based recommendation system architecture, data and strategy layers, and the internally built search ranking platform covering sampling, feature engineering, and model training, highlighting practical implementations and lessons learned.

AIMicroservicesdata pipeline
0 likes · 7 min read
Summary of 58 Group Technical Salon: Recommendation System Architecture and Search Ranking Algorithm Practices
NetEase Media Technology Team
NetEase Media Technology Team
May 16, 2019 · Backend Development

Design and Implementation of a Configurable, Extensible Content Processing System (Apollo)

Apollo is a configurable, extensible content‑processing platform that models each step as a node defined in a configuration file, supports multiple implementations for A/B testing, decouples producers and consumers via Kafka, ensures fault‑tolerant retries and replay, captures fine‑grained metrics through Canal‑to‑TiDB pipelines, and cuts new‑type development effort to roughly ten percent of the original cost while delivering high‑quality data to downstream teams.

Backend ArchitectureKafkaTiDB
0 likes · 9 min read
Design and Implementation of a Configurable, Extensible Content Processing System (Apollo)
HomeTech
HomeTech
Jan 18, 2019 · Big Data

Data Mill: A Real‑Time Spark Streaming Framework for DSP Business Support

Data Mill is a Spark‑Streaming‑based real‑time computation framework that abstracts tasks as DataFrames, enables SQL‑driven development, and supports DSP business requirements by reducing latency to 15‑30 minutes while providing a scalable architecture, caching strategy, and automated fault handling.

CacheDSPReal‑Time Computing
0 likes · 10 min read
Data Mill: A Real‑Time Spark Streaming Framework for DSP Business Support
DataFunTalk
DataFunTalk
Dec 20, 2018 · Artificial Intelligence

How to Build World-Class Visual AI Technology

This presentation outlines the fundamentals of computer vision, discusses key factors such as algorithm research, large‑scale training platforms, intelligent data processing, and hardware optimization, and shares practical experiences from DeepGlint on building a world‑class visual AI system and its real‑world applications.

Computer VisionHardware Optimizationdata pipeline
0 likes · 23 min read
How to Build World-Class Visual AI Technology
JD Tech
JD Tech
Oct 10, 2018 · Backend Development

Design and Architecture of JD's Virtual Order Center (Hamal)

The article explains the architecture and core mechanisms of JD's Virtual Order Center, describing how the Hamal service leverages MySQL binlog listening, Zookeeper coordination, fast TCP‑based consumption, read‑write separation, and multi‑level search to reliably process billions of virtual orders.

BackendBinlogdata pipeline
0 likes · 7 min read
Design and Architecture of JD's Virtual Order Center (Hamal)
21CTO
21CTO
Sep 28, 2018 · Artificial Intelligence

Inside E‑Commerce Recommendation Systems: From Data Collection to Real‑Time Personalization

This article explains how e‑commerce recommendation systems work, covering regular and personalized recommendation types, the challenges of user profiling and data handling, the three‑stage recommendation pipeline, and the overall system architecture that powers real‑time, AI‑driven product suggestions.

AIdata pipelinee‑commerce
0 likes · 17 min read
Inside E‑Commerce Recommendation Systems: From Data Collection to Real‑Time Personalization
Big Data and Microservices
Big Data and Microservices
Sep 3, 2018 · Big Data

From Raw Data to Business Impact: A Complete Data Analyst Skill Guide

The article outlines a comprehensive data‑analyst competency framework, covering data collection, storage, extraction, mining, analysis, visualization, and practical application, and provides concrete questions, techniques, and tool recommendations to help analysts turn raw data into actionable business insights.

Business IntelligenceData visualizationdata analysis
0 likes · 9 min read
From Raw Data to Business Impact: A Complete Data Analyst Skill Guide