Tagged articles
3675 articles
Page 33 of 37
DataFunTalk
DataFunTalk
May 22, 2018 · Information Security

Designing a Credit-Based Content Management System: Strategies, Risk Assessment, and AI Techniques

The article outlines how to build a credit‑based content management platform by describing the evolution of security practices, defining user‑generated, professional‑generated, and occupational content models, proposing a credit‑audit workflow with risk assessment, and presenting AI‑driven text classification and anti‑cheat methods to balance traffic, quality, and trust.

Artificial IntelligenceBig DataInformation Security
0 likes · 12 min read
Designing a Credit-Based Content Management System: Strategies, Risk Assessment, and AI Techniques
Architects' Tech Alliance
Architects' Tech Alliance
May 14, 2018 · Big Data

Understanding Hadoop MapReduce Architecture and YARN: Components, Workflow, and Optimization

This article explains Hadoop's distributed storage and processing framework, details the MapReduce programming model, describes the classic JobTracker/TaskTracker architecture, outlines the shuffle and combine phases, and introduces YARN as a scalable replacement with its ResourceManager, ApplicationMaster, and NodeManager components.

Big DataHadoopMapReduce
0 likes · 13 min read
Understanding Hadoop MapReduce Architecture and YARN: Components, Workflow, and Optimization
Alibaba Cloud Developer
Alibaba Cloud Developer
May 3, 2018 · Artificial Intelligence

How Alibaba’s City Brain Uses AI to Transform Urban Management

In a recent Cloud Xi conference, Hua Xiansheng, deputy director of Alibaba’s DAMO Academy Machine Intelligence Lab, presented the City Brain initiative, unveiling three AI-powered products—Tianyao, Tianying, and Tianji—that leverage massive video and sensor data to achieve real‑time perception, decision‑making, prediction, and intervention for smarter urban governance.

AIBig DataSmart City
0 likes · 10 min read
How Alibaba’s City Brain Uses AI to Transform Urban Management
Beike Product & Technology
Beike Product & Technology
Apr 26, 2018 · Big Data

Chain Home's OLAP Platform and Kylin Usage

This article details Chain Home's OLAP platform architecture and Kylin usage, covering the evolution from early ROLAP to MOLAP multi-dimensional engine, Kylin's basic principles, platform structure, application scenarios, usage specifications, capability extensions, and middleware development.

Apache KylinBig DataChain Home
0 likes · 11 min read
Chain Home's OLAP Platform and Kylin Usage
dbaplus Community
dbaplus Community
Apr 24, 2018 · Databases

Scaling Baidu’s TSDB to Trillions of Points: Elastic, High‑Performance Architecture

Baidu’s TSDB processes over 20 million data points per second per node and tens of thousands of queries per second cluster‑wide by employing a stateless read/write‑separated elastic architecture, multi‑layer storage across Redis, HBase and Hadoop, minute‑level geo‑redundant self‑healing, and a modified Gorilla compression that cuts storage by 80% with minimal CPU overhead.

Big DataTSDBTime Series Database
0 likes · 8 min read
Scaling Baidu’s TSDB to Trillions of Points: Elastic, High‑Performance Architecture
dbaplus Community
dbaplus Community
Apr 23, 2018 · Operations

Insights and Highlights from the 2018 Gdevops Global Agile Ops Summit

The 2018 Gdevops Global Agile Operations Summit in Chengdu gathered industry experts who shared practical insights on AIOps implementation, sharding database ecosystems, DevOps adoption in traditional enterprises, large‑scale data management, ElasticSearch clustering, AWS blue‑green deployments, cloud database operations, Alibaba's double‑11 ops platform, 58 delivery mini‑program architecture, and scalable game service design.

Big DataDevOpsaiops
0 likes · 13 min read
Insights and Highlights from the 2018 Gdevops Global Agile Ops Summit
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Apr 19, 2018 · Information Security

How Suning Built a Comprehensive Information Security Architecture

This article outlines Suning's evolution from a basic network operations unit to a sophisticated, multi‑layered security architecture that integrates organizational structure, protection platforms, risk management, big‑data threat perception, and continuous improvement to safeguard e‑commerce operations.

Big DataInformation SecuritySecurity Architecture
0 likes · 10 min read
How Suning Built a Comprehensive Information Security Architecture
Architecture Digest
Architecture Digest
Apr 19, 2018 · Cloud Computing

Understanding the Relationship Between Cloud Computing, Big Data, and Artificial Intelligence

This article explains how cloud computing, big data, and artificial intelligence are interrelated, describing the evolution from physical resource management to virtualized, elastic services, the roles of IaaS, PaaS, and SaaS, and how each technology benefits the others in modern applications.

Artificial IntelligenceBig DataCloud Computing
0 likes · 36 min read
Understanding the Relationship Between Cloud Computing, Big Data, and Artificial Intelligence
UCloud Tech
UCloud Tech
Apr 18, 2018 · Big Data

How Elasticsearch Powers Billion‑Record Log Analysis and Full‑Text Search

This article explains how Elasticsearch and the ELK stack address challenges of storing, securing, retrieving, and analyzing massive data volumes by providing distributed real‑time search, log collection, visualization, and even serving as a NoSQL alternative for large‑scale applications.

Big DataELKElasticsearch
0 likes · 7 min read
How Elasticsearch Powers Billion‑Record Log Analysis and Full‑Text Search
Architecture Digest
Architecture Digest
Apr 18, 2018 · Databases

Understanding Distributed Architecture and Its Applications in MySQL and Large‑Scale Systems

The article explains the concept of distributed architecture, its key characteristics such as cohesion and transparency, showcases how MySQL and middleware like Mycat are used in e‑commerce platforms, and outlines the evolution, practical implementations, and challenges of building scalable distributed database systems.

Big DataDatabase ArchitectureDistributed Systems
0 likes · 15 min read
Understanding Distributed Architecture and Its Applications in MySQL and Large‑Scale Systems
Qunar Tech Salon
Qunar Tech Salon
Apr 10, 2018 · Big Data

Design and Implementation of Meituan's Traffic Compass Data Warehouse for Hotel‑Travel Business

The article presents Meituan's Traffic Compass—a data‑warehouse‑driven traffic analysis platform for the hotel‑travel business—detailing its background, challenges, architectural layers, dimensional modeling, Kylin‑based query engine, configuration mechanisms, performance metrics, and future optimization plans.

AnalyticsBig DataKylin
0 likes · 14 min read
Design and Implementation of Meituan's Traffic Compass Data Warehouse for Hotel‑Travel Business
Qunar Tech Salon
Qunar Tech Salon
Apr 9, 2018 · Big Data

Analysis of Apache Spark 2.2.1 Memory Management Model

This article examines Spark's unified memory manager in version 2.2.1, detailing on‑heap and off‑heap memory regions, the four on‑heap memory pools, dynamic execution‑storage memory sharing, task memory accounting, and provides concrete calculation examples to explain UI discrepancies and runtime memory limits.

Big DataExecutorMemory Management
0 likes · 13 min read
Analysis of Apache Spark 2.2.1 Memory Management Model
dbaplus Community
dbaplus Community
Apr 3, 2018 · Big Data

How Meituan Built DataMan: A Scalable Data Quality Monitoring Platform for Big Data

This article details Meituan's DataMan platform, describing the background of data quality challenges, the eight-step PDCA-driven solution, architectural design, technical stack, monitoring standards, and the resulting improvements in data governance and operational efficiency across their massive data warehouse ecosystem.

Big DataData GovernanceData Quality
0 likes · 20 min read
How Meituan Built DataMan: A Scalable Data Quality Monitoring Platform for Big Data
ITPUB
ITPUB
Mar 29, 2018 · Big Data

Demystifying Hadoop: MapReduce, Shuffle, and YARN Architecture

This article explains Hadoop’s core components, the MapReduce programming model, the detailed shuffle and merge processes, and how YARN replaces the classic JobTracker/TaskTracker design to improve scalability and resource utilization in large‑scale data processing clusters.

Big DataHadoopMapReduce
0 likes · 15 min read
Demystifying Hadoop: MapReduce, Shuffle, and YARN Architecture
Architecture Digest
Architecture Digest
Mar 26, 2018 · Operations

Alipay’s Double 11 Architecture: Logical Data Centers, Distributed Transactions, and High‑Availability Strategies

The article details Alipay’s comprehensive architecture for the Double 11 shopping festival, covering its three‑layer IAAS/PAAS/SAAS model, logical data‑center design, multi‑active disaster‑recovery, blue‑green deployment, distributed data sharding, transaction processing, and the Ant Credit Pay service’s performance and risk‑control mechanisms.

AlipayArchitectureBig Data
0 likes · 16 min read
Alipay’s Double 11 Architecture: Logical Data Centers, Distributed Transactions, and High‑Availability Strategies
MaGe Linux Operations
MaGe Linux Operations
Mar 23, 2018 · Cloud Computing

Why Cloud Computing, Big Data, and AI Are Inseparable: A Beginner’s Guide

This article explains the origins and goals of cloud computing, how virtualization adds flexibility in time and space, the evolution from physical servers to public and private clouds, the role of IaaS, PaaS, and SaaS, and how big data and artificial intelligence intertwine with cloud services to enable modern intelligent applications.

Artificial IntelligenceBig DataCloud Computing
0 likes · 37 min read
Why Cloud Computing, Big Data, and AI Are Inseparable: A Beginner’s Guide
Meituan Technology Team
Meituan Technology Team
Mar 22, 2018 · Big Data

High-Performance User Behavior Analysis Solution for Massive Data

The paper describes a high‑performance user‑behavior analysis system that processes hundreds of billions of daily logs for Meituan‑Dianping, using an inverted‑index structure with bitmap UUID sets and timestamp sequences, combined with Spark, Spring and Alluxio optimizations to cut query times from hours to under five seconds.

Big DataOLAP analysisdistributed computing
0 likes · 14 min read
High-Performance User Behavior Analysis Solution for Massive Data
dbaplus Community
dbaplus Community
Mar 18, 2018 · Databases

What’s New in the Database World? March 2018 Release Roundup

The March 2018 DBAplus Newsletter compiles the latest releases across RDBMS, NoSQL, NewSQL, time‑series and big‑data ecosystems, highlighting new features, performance improvements, compatibility updates and key technical links for Oracle, MySQL, MariaDB, SQL Server, DB2, PostgreSQL, TiDB, CockroachDB, InfluxDB, Hadoop and several Chinese‑made databases.

Big DataDatabase ReleasesHTAP
0 likes · 21 min read
What’s New in the Database World? March 2018 Release Roundup
MaGe Linux Operations
MaGe Linux Operations
Mar 17, 2018 · Operations

From Manual Ops to Automated Cloud: A 7‑Year Journey of a Game Ops Team

This article chronicles a game company's operations team evolution over seven years, detailing how it grew from a tiny manual crew to a large, automated, cloud‑native organization that built its own CDN, monitoring, and platform solutions while tackling scaling, reliability, and service‑orientation challenges.

Big DataCDNCloud Computing
0 likes · 21 min read
From Manual Ops to Automated Cloud: A 7‑Year Journey of a Game Ops Team
Architecture Digest
Architecture Digest
Mar 14, 2018 · Big Data

Attributes Matrix and Data Flow Models of Apache Streaming Platforms

This article presents a comprehensive attributes matrix and data‑flow model overview for major Apache streaming platforms, comparing versions, sponsors, event handling, fault tolerance, processing order, latency, resource management, APIs, and supported connectors to aid practical technology selection.

ApacheBig Dataattributes matrix
0 likes · 16 min read
Attributes Matrix and Data Flow Models of Apache Streaming Platforms
Beike Product & Technology
Beike Product & Technology
Mar 9, 2018 · Big Data

How Lianjia Built a Low‑Latency Real‑Time Data Platform with Spark Streaming

This article details Lianjia's journey of designing and implementing a low‑latency, stable real‑time computing platform using Spark Streaming on YARN, covering technical selection, architecture components, version compatibility challenges, exactly‑once semantics, graceful shutdown, Kafka tuning, and future enhancements.

Big DataExactly-OnceKafka
0 likes · 11 min read
How Lianjia Built a Low‑Latency Real‑Time Data Platform with Spark Streaming
Qunar Tech Salon
Qunar Tech Salon
Mar 9, 2018 · Big Data

New Features in Apache Spark 2.3: Continuous Streaming, Kubernetes Scheduler, Pandas UDFs, and MLlib Enhancements

Apache Spark 2.3 introduces major upgrades such as millisecond‑latency continuous streaming, stream‑to‑stream joins, a native Kubernetes scheduler backend, accelerated Pandas UDFs, and several MLlib improvements, all aimed at making big‑data processing faster, easier, and smarter.

Apache SparkBig DataContinuous Processing
0 likes · 7 min read
New Features in Apache Spark 2.3: Continuous Streaming, Kubernetes Scheduler, Pandas UDFs, and MLlib Enhancements
Ctrip Technology
Ctrip Technology
Mar 8, 2018 · Big Data

Ctrip Wireless APM Platform: Architecture, Metrics, and Technical Details

The article describes the evolution of Ctrip's wireless APM platform from the early UBT-based monitoring to a globally‑oriented, metric‑rich system that processes over 100 billion data points daily using Storm and Elasticsearch, detailing its design, key performance dimensions, data‑volume trade‑offs, and implementation choices.

APMBig DataCtrip
0 likes · 12 min read
Ctrip Wireless APM Platform: Architecture, Metrics, and Technical Details
dbaplus Community
dbaplus Community
Mar 7, 2018 · Big Data

Taming Massive HDFS Data Growth: Monitoring, Capacity Planning & Hive Optimization

The article outlines a systematic approach for large‑scale Hadoop clusters to monitor daily data growth, identify abnormal paths, manage rapid expansion, clean unused cold data, and implement capacity forecasts, while providing concrete daily and quarterly actions, Hive‑specific strategies, and practical examples to keep storage under control.

Big DataData GrowthHDFS
0 likes · 17 min read
Taming Massive HDFS Data Growth: Monitoring, Capacity Planning & Hive Optimization
StarRing Big Data Open Lab
StarRing Big Data Open Lab
Mar 2, 2018 · Cloud Computing

How Cloud, Big Data, and AI Converge to Transform Enterprise Data Strategies

The article explores how the integration of cloud computing, big data, and artificial intelligence is reshaping enterprise data platforms, outlining a multi‑stage evolution from data unification to ecosystem building and forecasting the strategic importance of data in future business transformation.

Artificial IntelligenceBig DataEnterprise Data
0 likes · 9 min read
How Cloud, Big Data, and AI Converge to Transform Enterprise Data Strategies
Hulu Beijing
Hulu Beijing
Feb 28, 2018 · Big Data

How Hulu’s Nesto Engine Delivers Near‑Real‑Time OLAP on TB‑Scale Data

This article introduces Hulu's in‑house OLAP engine Nesto, detailing its near‑real‑time data ingestion, nested data model, TB‑level storage using HBase and Parquet, MPP query execution, custom predicate library, and the overall architecture that enables sub‑second ad‑hoc queries for user analytics.

Big DataColumnar StorageDistributed Systems
0 likes · 22 min read
How Hulu’s Nesto Engine Delivers Near‑Real‑Time OLAP on TB‑Scale Data
JD Tech
JD Tech
Feb 28, 2018 · Operations

CallGraph: JD.com's Distributed Tracing and Service Governance Platform

CallGraph is JD.com's internally developed distributed tracing and service governance platform that addresses the challenges of monitoring complex microservice architectures by providing low‑intrusion, low‑latency tracing, real‑time analytics, configurable sampling, and integration with JMQ, Storm, Spark, HBase, and JimDB for both operational insight and performance optimization.

Big DataDistributed TracingMicroservices
0 likes · 12 min read
CallGraph: JD.com's Distributed Tracing and Service Governance Platform
21CTO
21CTO
Feb 20, 2018 · Big Data

Why Real-Time Streaming Is the Next Big Data Revolution for Developers

This article explains how real-time streaming has evolved from batch Hadoop systems through Lambda architecture to modern Kappa-style pipelines, highlighting its growing importance for developers, enterprises, and the integration of streaming with microservices, AI, and cloud-native technologies.

AI integrationBig DataKappa architecture
0 likes · 8 min read
Why Real-Time Streaming Is the Next Big Data Revolution for Developers
Architecture Digest
Architecture Digest
Feb 11, 2018 · Artificial Intelligence

Recent Advances in Bayesian Machine Learning: Foundations, Non‑Parametric Methods, and Large‑Scale Applications

This article reviews recent progress in Bayesian machine learning, covering foundational theory, non‑parametric approaches such as Dirichlet and Indian buffet processes, regularized Bayesian inference, and scalable techniques for big‑data environments including stochastic variational methods, distributed algorithms, and hardware acceleration.

Big DataMonte CarloVariational Inference
0 likes · 23 min read
Recent Advances in Bayesian Machine Learning: Foundations, Non‑Parametric Methods, and Large‑Scale Applications
Java Backend Technology
Java Backend Technology
Feb 6, 2018 · Artificial Intelligence

How JD Built a Scalable AI-Powered Recommendation Engine for E‑Commerce

This article details JD's evolution from rule‑based recommendations to a multi‑screen, AI‑driven personalization platform, describing its system architecture, data pipelines, feature services, and key technologies that enable real‑time, user‑centric product suggestions across the e‑commerce ecosystem.

Artificial IntelligenceBig Datae‑commerce
0 likes · 20 min read
How JD Built a Scalable AI-Powered Recommendation Engine for E‑Commerce
Architecture Digest
Architecture Digest
Feb 1, 2018 · Fundamentals

How Search Engines Work: Building Inverted Indexes

This article explains the core of search engine technology by describing what an inverted index is, how it is built using single‑pass memory and multi‑way merge methods, how indexes can be partitioned and incrementally updated, and how Hadoop can be used for large‑scale indexing.

Big DataHadoopindexing
0 likes · 10 min read
How Search Engines Work: Building Inverted Indexes
iQIYI Technical Product Team
iQIYI Technical Product Team
Jan 31, 2018 · Big Data

Evolution of iQIYI Real-Time Big Data Collection System

iQIYI’s big‑data collection system has progressed from simple HTTP log uploads to a Flume‑Kafka pipeline and finally to a custom Venus‑Agent architecture with centralized configuration, persistent offsets, dual‑Kafka streams and Flink processing, now handling tens of millions of queries per second and over three hundred billion records daily to power its AI‑driven services.

Big DataFlinkFlume
0 likes · 15 min read
Evolution of iQIYI Real-Time Big Data Collection System
Meituan Technology Team
Meituan Technology Team
Jan 26, 2018 · Big Data

Design and Implementation of a Real-Time Data Processing System at Meituan

Meituan designed a Storm‑based real‑time data processing platform that guarantees at‑least‑once delivery and high availability, employs a custom spout, regression‑driven traffic smoothing, and a low‑latency KV store with atomic operations, persisting results in Kafka, MySQL and Cellar to power merchant dashboards and heat‑tag analytics, while planning broader real‑time analytics expansion.

Big DataDistributed SystemsStorm
0 likes · 10 min read
Design and Implementation of a Real-Time Data Processing System at Meituan
Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
Jan 18, 2018 · Big Data

Smart Flood Control: Donghua Software’s IoT, Big Data & Cloud Solution

The case study details Donghua Software’s smart flood‑control and drainage solution, which integrates IoT sensors, NB‑IoT/eLTE networks, Huawei’s FusionSphere cloud platform, big‑data analytics, and GIS to provide real‑time monitoring, predictive warnings, automated gate control, and efficient emergency dispatch for urban water management.

Big DataCloud ComputingFlood Management
0 likes · 12 min read
Smart Flood Control: Donghua Software’s IoT, Big Data & Cloud Solution
Efficient Ops
Efficient Ops
Jan 16, 2018 · Operations

How Tencent Secures Game Operations: Real Cases, Challenges, and Data‑Driven Solutions

This article shares a comprehensive overview of game operation security at Tencent, covering personal background, real‑world incident cases, the inherent challenges of large‑scale game services, past monitoring efforts, and a new data‑driven alerting framework that dramatically reduces false alarms while protecting game economies.

AlertingBig DataGame Security
0 likes · 25 min read
How Tencent Secures Game Operations: Real Cases, Challenges, and Data‑Driven Solutions
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Jan 15, 2018 · Backend Development

Inside the Architecture of the World’s Biggest Websites: Wikipedia, Facebook, YouTube, and More

This article surveys the technical architectures of major web platforms—including Wikipedia, Facebook, Yahoo! Mail, Twitter, Google App Engine, Amazon, and Youku—highlighting their design patterns, scaling techniques, storage solutions, and caching strategies to reveal how massive online services are built and operated.

ArchitectureBackendBig Data
0 likes · 10 min read
Inside the Architecture of the World’s Biggest Websites: Wikipedia, Facebook, YouTube, and More
StarRing Big Data Open Lab
StarRing Big Data Open Lab
Jan 5, 2018 · Big Data

What Drove Big Data’s 2017 Surge and What’s Next? Insights & Predictions

Analyzing 2017’s big data boom, the article explores how the 4V characteristics—volume, variety, velocity, and value—spurred innovations like distributed storage, NoSQL, real‑time stream processing, and AI integration, and predicts future hotspots such as SQL resurgence, cloud‑based platforms, and AI‑driven analytics.

Artificial IntelligenceBig DataReal-time Processing
0 likes · 11 min read
What Drove Big Data’s 2017 Surge and What’s Next? Insights & Predictions
AntTech
AntTech
Jan 4, 2018 · Databases

Report on VLDB 2017 Conference: Insights and Highlights from Database Research

Attending VLDB 2017 in Munich, the report summarizes the conference’s broad coverage of database research—from new hardware‑accelerated prototypes and Spark‑based big‑data processing to Oracle and SAP HANA case studies, keynotes, notable papers, and reflections on industry trends and Chinese contributions.

Big DataHardware accelerationVLDB
0 likes · 22 min read
Report on VLDB 2017 Conference: Insights and Highlights from Database Research
dbaplus Community
dbaplus Community
Jan 1, 2018 · Big Data

How Vipshop Leverages Data Processing, Analytics, and Mining for Smarter Ops

This article summarizes Wu Xiaoguang's talk at Gdevops 2017, detailing how Vipshop integrates data processing, analysis, and mining technologies—such as Flume, Kafka, Spark, and custom scheduling—to improve operational decision‑making, performance monitoring, root‑cause analysis, and predictive modeling across its e‑commerce platform.

Big DataData AnalyticsOperations
0 likes · 23 min read
How Vipshop Leverages Data Processing, Analytics, and Mining for Smarter Ops
Tencent Architect
Tencent Architect
Dec 30, 2017 · Databases

An Overview of Time Series Databases and Tencent CTSDB

This article introduces the concept, characteristics, and use cases of time series databases, explains the data model and challenges of traditional solutions, and provides a detailed overview of Tencent's Cloud Time Series Database (CTSDB) along with performance comparisons against InfluxDB.

Big DataCTSDBTime Series Database
0 likes · 12 min read
An Overview of Time Series Databases and Tencent CTSDB
Architects' Tech Alliance
Architects' Tech Alliance
Dec 28, 2017 · Operations

Intelligent Operations: Machine‑Learning‑Based AIOps – Lecture Summary by Prof. Pei Dan

In this lecture, Prof. Pei Dan of Tsinghua University outlines the evolution of intelligent operations from rule‑based automation to machine‑learning‑driven AIOps, discusses data, feedback loops, and practical challenges, and calls for stronger collaboration between industry and academia to accelerate research and deployment.

Big DataCloud Computingaiops
0 likes · 10 min read
Intelligent Operations: Machine‑Learning‑Based AIOps – Lecture Summary by Prof. Pei Dan
Meituan Technology Team
Meituan Technology Team
Dec 28, 2017 · Big Data

Design and Implementation of a Scalable Scenario Query System for Meituan

Meituan built a scalable scenario‑query platform that unifies traffic, activity and investment data by layering RPC services, a Storm‑driven pre‑computation tree stored in Redis/Tair, and a middle‑platform API with circuit‑breaker logic, cutting response times from seconds to under one second while dramatically reducing code coupling and simplifying future feature development.

Apache StormBig DataNoSQL
0 likes · 12 min read
Design and Implementation of a Scalable Scenario Query System for Meituan
Architecture Digest
Architecture Digest
Dec 27, 2017 · Backend Development

Handling Transactions, Failover, and Exactly‑Once Semantics in Distributed Systems

This article explores how distributed systems determine node liveness, manage failover and recovery, and implement at‑most‑once, at‑least‑once, and exactly‑once processing guarantees—including opaque transactions and two‑phase commit—using examples from Kafka, Zookeeper, and big‑data pipelines.

Big DataDistributed SystemsExactly-Once
0 likes · 15 min read
Handling Transactions, Failover, and Exactly‑Once Semantics in Distributed Systems
dbaplus Community
dbaplus Community
Dec 26, 2017 · Big Data

Turning Raw Logs into Structured Data with DBus Visual Rule Operators

This article explains how the open‑source DBus platform, combined with the Wormhole streaming engine, captures raw application logs, lets users configure visual rule operators, and transforms the unstructured message part into schema‑driven, Kafka‑ready data for downstream analytics.

Big DataDBusLog Processing
0 likes · 15 min read
Turning Raw Logs into Structured Data with DBus Visual Rule Operators
Architecture Digest
Architecture Digest
Dec 22, 2017 · Big Data

Redesign and Optimization of the WeChat Pay Transaction Record System

This article presents a comprehensive case study of how WeChat Pay rebuilt its transaction record storage system to handle massive data volumes, improve performance, ensure data completeness, support flexible queries, and strengthen security through distributed key‑value storage, data partitioning, and operational safeguards.

Big DataData PartitioningWeChat Pay
0 likes · 11 min read
Redesign and Optimization of the WeChat Pay Transaction Record System
Qunar Tech Salon
Qunar Tech Salon
Dec 21, 2017 · Big Data

Experience and Optimization Strategies for Apache Kylin in Real-Time OLAP

This article shares a data engineer's three‑year experience using Apache Kylin for real‑time OLAP on petabyte‑scale data, describing the business background, challenges of pre‑computation, cube modeling, dimension reduction, and various optimization techniques such as hierarchy, mandatory, and joint dimensions, as well as precise count‑distinct handling.

Apache KylinBig DataCube Optimization
0 likes · 13 min read
Experience and Optimization Strategies for Apache Kylin in Real-Time OLAP
Meitu Technology
Meitu Technology
Dec 19, 2017 · Industry Insights

Inside Meitu’s In‑House Log Collection System Arachnia: Design, Challenges, and Core Mechanisms

This article introduces Meitu’s self‑developed log collection system Arachnia, explaining why a custom solution was needed for massive server‑side user‑behavior logs, the key requirements such as reliability and real‑time throughput, and the core architectural mechanisms that address those challenges.

ArachniaBig DataMeitu
0 likes · 2 min read
Inside Meitu’s In‑House Log Collection System Arachnia: Design, Challenges, and Core Mechanisms
Meitu Technology
Meitu Technology
Dec 19, 2017 · Big Data

Meitu Internet Technology Salon Session 7: Practices in Recommendation Algorithms, Big Data, and Personalized Recommendation

At Meitu’s seventh Internet Technology Salon in Xiamen, over a hundred experts discussed recommendation algorithms and big‑data solutions, with talks on the Arachnia log‑collection system, the Naix distributed bitmap service, Meitu’s personalized recommendation pipeline challenges, and novel data‑missing‑theory models for improved performance.

Big Datadata collectiondistributed bitmap
0 likes · 8 min read
Meitu Internet Technology Salon Session 7: Practices in Recommendation Algorithms, Big Data, and Personalized Recommendation
Architecture Digest
Architecture Digest
Dec 16, 2017 · Big Data

Performance Comparison of Apache Flink and Apache Storm for Real‑Time Stream Processing

This report presents a systematic performance evaluation of Apache Flink and Apache Storm across multiple real‑time processing scenarios, measuring throughput, latency, message‑delivery semantics, and state‑backend effects, and provides recommendations for selecting the most suitable engine based on the observed results.

Big DataFlinkReal-time analytics
0 likes · 21 min read
Performance Comparison of Apache Flink and Apache Storm for Real‑Time Stream Processing
58 Tech
58 Tech
Dec 15, 2017 · Big Data

Design and Architecture of WMDA: A Comprehensive User Behavior Analysis Platform

The article details WMDA, a no‑code and manual‑code data collection platform for PC, mobile and app that supports real‑time and offline user behavior analysis, describing its functional model, behavior taxonomy, five‑layer architecture, tracking techniques, circle‑selection, data services, streaming and batch processing pipelines, and related technologies such as Storm, Spark, Druid and Roaring Bitmap.

Big DataDruidReal-time Streaming
0 likes · 18 min read
Design and Architecture of WMDA: A Comprehensive User Behavior Analysis Platform
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Dec 15, 2017 · Operations

Automated Fault Recovery Architecture for Alibaba's Network during Double Eleven

The article describes Alibaba's end‑to‑end automated fault recovery system for its massive network, covering extensive data collection, Spark‑based event processing, flexible alerting with Siddhi, alert convergence using PageRank, and scripted recovery actions to achieve high availability during the Double Eleven traffic surge.

Big DataNetwork MonitoringOperations
0 likes · 9 min read
Automated Fault Recovery Architecture for Alibaba's Network during Double Eleven
dbaplus Community
dbaplus Community
Dec 14, 2017 · Big Data

Scaling Vipshop’s Big Data Platform: Monitoring, Multi‑HDFS, Yarn Optimization & Capping

In 2017 Vipshop’s senior big‑data architect shares how the company grew its Hadoop‑based platform from zero to a thousand‑node cluster, detailing cluster health monitoring, multi‑HDFS deployment via Hive, Yarn container allocation improvements, and a hook‑driven Capping resource‑control system to boost stability and efficiency.

Big DataHDFScapping
0 likes · 15 min read
Scaling Vipshop’s Big Data Platform: Monitoring, Multi‑HDFS, Yarn Optimization & Capping
Qunar Tech Salon
Qunar Tech Salon
Dec 14, 2017 · Databases

TiDB Architecture, Deployment, and Monitoring Practices at Qunar

This article explains Qunar's transition from MySQL, Redis, and HBase to TiDB, detailing the background of distributed databases, TiDB's architecture, hardware selection, deployment automation, monitoring setup, and real‑world usage scenarios to address scalability and high‑availability challenges.

Big DataDatabase ArchitectureDeployment
0 likes · 14 min read
TiDB Architecture, Deployment, and Monitoring Practices at Qunar
Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
Dec 11, 2017 · Artificial Intelligence

How AI and Big Data Are Transforming Urban Traffic Management

The 2017 12th China Intelligent Transportation Conference highlighted system thinking, AI, and innovation as key drivers for smarter city traffic, outlining a three‑step top‑level design, AI‑powered applications, and intersection innovations that together promise safer, more efficient, and fully automated urban mobility.

AIBig DataIntelligent Transportation
0 likes · 8 min read
How AI and Big Data Are Transforming Urban Traffic Management
AntTech
AntTech
Dec 11, 2017 · Artificial Intelligence

How AI and Big Data Transform the Insurance Industry: Differentiated Pricing, Smart Claims, Risk Control, and Operations

The article examines how emerging AI and big‑data technologies are reshaping insurance by enabling differentiated pricing, automating claims and customer service, strengthening fraud detection, and improving personalized product recommendation and operational efficiency across the sector.

Artificial IntelligenceBig DataBlockchain
0 likes · 13 min read
How AI and Big Data Transform the Insurance Industry: Differentiated Pricing, Smart Claims, Risk Control, and Operations
Efficient Ops
Efficient Ops
Dec 7, 2017 · Operations

How Multi-Dimensional Root Cause Analysis Boosts Monitoring Efficiency with AI

This article introduces the challenges of multi-dimensional monitoring, explains the limitations of traditional alerting, and presents the MDRCA algorithm—combining K‑means clustering, Explanatory Power, and Surprise metrics—to pinpoint root causes efficiently, while sharing practical AI integration experiences for large‑scale monitoring platforms.

AIBig DataKMeans
0 likes · 15 min read
How Multi-Dimensional Root Cause Analysis Boosts Monitoring Efficiency with AI
Meituan Technology Team
Meituan Technology Team
Dec 1, 2017 · Big Data

Metric Logic Tree: Automated Anomaly Analysis for Business Metrics

The Metric Logic Tree automates business metric anomaly analysis by integrating heterogeneous data sources (Kylin, MySQL, Elasticsearch, Druid) with a three‑layer architecture—metric calculation, algorithmic analysis (waterfall and Gini‑coefficient methods), and a master‑worker computation service—that parallelizes queries, delivers immediate conclusions, and shortens decision cycles, as demonstrated in Meituan‑Dianping’s hotel‑travel operations.

Big Dataalgorithmanomaly detection
0 likes · 7 min read
Metric Logic Tree: Automated Anomaly Analysis for Business Metrics
AntTech
AntTech
Dec 1, 2017 · Big Data

Insights and Paper Summaries from KDD 2017 Conference

The article provides a comprehensive overview of KDD 2017, including acceptance statistics, best paper awards, Ant Group's contributions, detailed discussions on AB testing, graph mining, and selected research papers across data mining, machine learning, and anomaly detection, offering valuable insights for practitioners and researchers.

AB testingBig DataKDD
0 likes · 30 min read
Insights and Paper Summaries from KDD 2017 Conference
Efficient Ops
Efficient Ops
Nov 27, 2017 · Operations

How Facebook Scales to Billions: Disaggregated Networks, Storage, and Warm Spark

Facebook’s journey from early startup ops to supporting over 2 billion monthly users reveals how disaggregated network, storage, and warm‑storage‑enabled Spark architectures overcome scalability bottlenecks, illustrating the operational strategies and design principles that power massive, reliable data‑center services.

Big DataDistributed SystemsOperations
0 likes · 12 min read
How Facebook Scales to Billions: Disaggregated Networks, Storage, and Warm Spark
iQIYI Technical Product Team
iQIYI Technical Product Team
Nov 24, 2017 · Information Security

Risk Control System for Live Streaming: Real‑time Interception (Pluto) and Big Data Analysis (Mars)

iQIYI’s live‑stream risk‑control platform combines the real‑time interception engine Pluto with the big‑data analytics system Mars to curb black‑market registration fraud and red‑packet abuse, processing over a billion daily requests through adaptive filters, Kafka‑Spark pipelines, and clustering algorithms that now limit fake popularity to 10‑30 % and red‑packet capture to under 3 %.

Big DataMarsPluto
0 likes · 11 min read
Risk Control System for Live Streaming: Real‑time Interception (Pluto) and Big Data Analysis (Mars)
Suning Technology
Suning Technology
Nov 20, 2017 · Big Data

How ZEUS Turns Monitoring Data into Automated Decisions for Enterprise Systems

ZEUS, Suning’s decision analysis platform, integrates monitoring data from tools like Baymax and HIRO, applies CEP aggregation and Drools rule evaluation, and leverages big‑data storage and machine‑learning models to automatically identify root causes, provide real‑time alerts, and enable self‑healing in large‑scale distributed systems.

Big Datadecision analysisrule engine
0 likes · 14 min read
How ZEUS Turns Monitoring Data into Automated Decisions for Enterprise Systems
Architects' Tech Alliance
Architects' Tech Alliance
Nov 16, 2017 · Operations

Understanding AIOps: How AI‑Driven Operations Transform IT Management

The article explains how AIOps—an AI‑powered IT operations platform that combines big‑data analytics, machine learning, and automation—revolutionizes traditional IT Ops by enabling rapid, accurate incident detection, root‑cause analysis, and self‑healing, thereby freeing CIOs to focus on strategic business value.

Big DataDigital Transformationaiops
0 likes · 8 min read
Understanding AIOps: How AI‑Driven Operations Transform IT Management
Efficient Ops
Efficient Ops
Nov 15, 2017 · Big Data

How Tencent Built a 10 TB‑Per‑Day Full‑Link Log Monitoring Platform

This article explains how Tencent's ZhiYun full‑link log monitoring platform handles massive daily logs, overcomes challenges of diverse log formats, high throughput, fault‑tolerant design, and provides scalable storage, query, and alerting capabilities for distributed micro‑service environments.

Big DataDistributed SystemsLog Monitoring
0 likes · 10 min read
How Tencent Built a 10 TB‑Per‑Day Full‑Link Log Monitoring Platform
Suning Technology
Suning Technology
Nov 13, 2017 · Backend Development

How Suning Scaled Its Membership System for Double‑11: From Legacy POS to Multi‑Active Architecture

This article examines Suning's evolution of its membership platform—from an early offline POS system to a vertically split, cloud‑native architecture—detailing capacity planning, performance testing, data migration with Spark, multi‑active deployment, and future plans for cross‑region high availability.

Big DataCloud NativeData Migration
0 likes · 15 min read
How Suning Scaled Its Membership System for Double‑11: From Legacy POS to Multi‑Active Architecture
21CTO
21CTO
Nov 11, 2017 · Big Data

How We Built a Scalable Seller Log System with Kafka, Storm, ES & HBase

This article explains the design and implementation of a unified seller‑operation logging platform that uses Kafka for ingestion, Storm for real‑time processing, Elasticsearch for hot‑data search, and HBase for cold‑data storage, detailing the challenges faced and the optimizations applied.

Big DataElasticsearchHBase
0 likes · 12 min read
How We Built a Scalable Seller Log System with Kafka, Storm, ES & HBase
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Nov 8, 2017 · Operations

Inside Ctrip’s Evolving Architecture: Ops, Frameworks, and Big Data Insights

This article explores Ctrip’s continuously evolving architecture, detailing its three-layer composition of operations, frameworks, and applications, and examines real-world case studies of its release system, configuration management, SOA, and a massive User Profile big‑data project, highlighting key innovations and lessons learned.

Big DataCtripDeployment
0 likes · 11 min read
Inside Ctrip’s Evolving Architecture: Ops, Frameworks, and Big Data Insights
Tencent Cloud Developer
Tencent Cloud Developer
Nov 3, 2017 · Industry Insights

How Tencent Cloud’s Big Data Platform Ranked in China’s Fifth Evaluation

China’s Data Center Alliance released its fifth big‑data product evaluation, testing 17 solutions from 16 vendors across SQL, NoSQL, and machine‑learning workloads, with Tencent Cloud’s platform achieving top rankings in NoSQL tests and highlighting the nation’s push toward standardized, high‑performance big‑data infrastructure.

Big DataData PlatformsIndustry Benchmark
0 likes · 5 min read
How Tencent Cloud’s Big Data Platform Ranked in China’s Fifth Evaluation
Alibaba Cloud Developer
Alibaba Cloud Developer
Nov 3, 2017 · Big Data

How Alibaba Built an EB-Scale, Real-Time Big Data Platform

Alibaba’s senior data expert Yao Bin Hui explains how the company constructed a standardized, end-to-end big-data ecosystem—from low-level data collection and AI algorithms to data services and product platforms—enabling petabyte-scale integration and second-level response times that power both internal operations and millions of external users.

AlibabaBig DataData Architecture
0 likes · 10 min read
How Alibaba Built an EB-Scale, Real-Time Big Data Platform
dbaplus Community
dbaplus Community
Oct 30, 2017 · Big Data

How to Build a Real‑Time Spam Monitoring System with Apache Storm

This article walks through the design, deployment, and code implementation of a real‑time spam detection pipeline using Apache Storm, comparing it with Hadoop, detailing cluster setup, topology components, data flow, and how to package and run the solution on a distributed Storm cluster.

Apache StormBig DataHibernate
0 likes · 13 min read
How to Build a Real‑Time Spam Monitoring System with Apache Storm
21CTO
21CTO
Oct 26, 2017 · Backend Development

From Data Platform Battles to AI Dreams: A Senior Engineer’s 3‑Year Journey at Alibaba

A senior Alibaba engineer reflects on three years of building a large‑scale data platform, tackling distributed rate‑limiting challenges, leading cross‑regional projects, and pursuing AI research, while sharing personal insights on career growth, technical problem‑solving, and the value of continuous learning.

AI learningBig DataDistributed Systems
0 likes · 11 min read
From Data Platform Battles to AI Dreams: A Senior Engineer’s 3‑Year Journey at Alibaba
Liulishuo Tech Team
Liulishuo Tech Team
Oct 22, 2017 · Big Data

Data-CI: A SQL-Based Data Unit Testing Framework for ETL

The article introduces data-ci, a SQL‑driven unit testing framework that lets engineers write, organize, and automate data validation tests for ETL pipelines, providing assertions, failure callbacks, coverage reporting, and CI integration to improve data quality and reliability.

Big DataData QualityETL
0 likes · 9 min read
Data-CI: A SQL-Based Data Unit Testing Framework for ETL
Full-Stack DevOps & Kubernetes
Full-Stack DevOps & Kubernetes
Oct 21, 2017 · Big Data

Deploy Hadoop CDH5.4 on CentOS 6: Install HDFS, YARN, and WebHDFS

This guide walks through preparing three CentOS 6.9 nodes, configuring hostnames, time sync, password‑less SSH, disabling IPv6, installing JDK, downloading CDH 5.4, setting up core‑site and hdfs‑site XML files, formatting the NameNode, starting HDFS services, configuring YARN and MapReduce, and verifying the installations via the Web UI.

Big DataCDHCentOS
0 likes · 18 min read
Deploy Hadoop CDH5.4 on CentOS 6: Install HDFS, YARN, and WebHDFS
Efficient Ops
Efficient Ops
Oct 18, 2017 · Operations

How Bilibili Scaled Its Log System to 10TB Daily with Elastic Stack

This article details Bilibili's Billions log platform—from its fragmented origins and design goals to the elastic‑stack‑based architecture, shard management, log sampling, custom Go splitters, and monitoring enhancements—highlighting the challenges faced and the roadmap for future improvements.

Big DataElastic StackLog Management
0 likes · 17 min read
How Bilibili Scaled Its Log System to 10TB Daily with Elastic Stack