Tagged articles
3675 articles
Page 34 of 37
dbaplus Community
dbaplus Community
Oct 15, 2017 · Big Data

How JD Built a Scalable Seller Log Platform with Kafka, Storm, ES & HBase

This article details JD's end‑to‑end seller log system architecture, explaining why Kafka, Storm, Elasticsearch and HBase were chosen, the challenges faced during scaling, and the practical solutions implemented to achieve a unified, high‑throughput logging platform for merchants and operations.

Big DataElasticsearchHBase
0 likes · 13 min read
How JD Built a Scalable Seller Log Platform with Kafka, Storm, ES & HBase
Alibaba Cloud Developer
Alibaba Cloud Developer
Oct 15, 2017 · Information Security

How Alibaba’s Data Security Maturity Model (DSMM) Is Shaping China’s Data Protection Landscape

The article explains Alibaba's Data Security Maturity Model (DSMM), its partnership program, the involvement of 17 leading security firms, and how the model aims to improve data security capabilities across industries by establishing standardized assessment criteria and fostering ecosystem collaboration.

AlibabaBig DataDSMM
0 likes · 10 min read
How Alibaba’s Data Security Maturity Model (DSMM) Is Shaping China’s Data Protection Landscape
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Oct 12, 2017 · Backend Development

How Taobao Scaled Its Backend Architecture Over Time

This article outlines Taobao's learning objectives, traces the evolution of its backend architecture from V1.0 to V3.0, highlights the technical challenges faced at each stage, and explains the architectural decisions—such as modularization, service‑oriented frameworks, distributed storage, and large‑scale monitoring—that enabled massive scalability, reliability, and performance improvements.

ArchitectureBackendBig Data
0 likes · 6 min read
How Taobao Scaled Its Backend Architecture Over Time
Baidu Intelligent Testing
Baidu Intelligent Testing
Oct 9, 2017 · Big Data

User Behavior Analysis: From Data Acquisition to Funnel Insights

The article explains how to move beyond macro app metrics by collecting offline and real‑time user data, storing it in HDFS, processing it with Spark, visualizing behavior paths as state‑machine trees, and performing branch‑funnel analysis to uncover conversion bottlenecks and improve product quality.

AnalyticsBig DataFunnel Analysis
0 likes · 5 min read
User Behavior Analysis: From Data Acquisition to Funnel Insights
ITPUB
ITPUB
Sep 30, 2017 · Big Data

Designing Scalable Open‑Source ETL Systems: Lessons from Baidu Waimai

This talk details Baidu Waimai's end‑to‑end ETL design, covering demand sources, data flow patterns, multi‑stage system evolution, storage choices, scheduling architecture, configuration‑driven processing, quality monitoring, and how data lineage enables transparent, self‑service data delivery.

Big DataData QualityETL
0 likes · 25 min read
Designing Scalable Open‑Source ETL Systems: Lessons from Baidu Waimai
Tongcheng Travel Technology Center
Tongcheng Travel Technology Center
Sep 29, 2017 · Big Data

Evolution of Monitoring Architecture and Traffic Alert Algorithms at Tongcheng Travel

This article describes how Tongcheng Travel’s monitoring system evolved from a monolithic design to a distributed and big‑data‑based architecture, introducing real‑time processing with Storm, machine‑learning‑enhanced alerts, and a multivariate linear regression model that dramatically improves traffic anomaly detection accuracy.

Big DataReal-time Processingarchitecture evolution
0 likes · 10 min read
Evolution of Monitoring Architecture and Traffic Alert Algorithms at Tongcheng Travel
ITPUB
ITPUB
Sep 29, 2017 · Big Data

Designing an Open ETL System: Baidu Waimai’s Scalable Data Pipeline Practices

In this talk, a Baidu Waimai engineer explains the motivations, requirements, and architectural choices behind their open‑source ETL platform, covering data flow patterns, logical mappings, storage options, scheduling, metadata management, and quality monitoring to achieve scalable, transparent, and explainable data delivery.

Big DataETLScheduling
0 likes · 26 min read
Designing an Open ETL System: Baidu Waimai’s Scalable Data Pipeline Practices
21CTO
21CTO
Sep 25, 2017 · Big Data

How Meitu Scaled Its Billion-User Data Analytics: Architecture Evolution and Lessons

This article explains how Meitu built and evolved a large‑scale data statistics platform to handle billions of users, detailing the challenges of growing data volume, the architectural shifts from simple scripts to Hadoop, and the design of modular components for job management, scheduling, execution, and future expansion.

Big DataData PlatformHadoop
0 likes · 16 min read
How Meitu Scaled Its Billion-User Data Analytics: Architecture Evolution and Lessons
Qunar Tech Salon
Qunar Tech Salon
Sep 25, 2017 · Big Data

Comprehensive Guide to Spark Ecosystem: Data Warehouse, Machine Learning, Streaming, and Enterprise Use Cases

This article provides an extensive overview of Apache Spark’s ecosystem—including its data‑warehouse capabilities, ML/MLlib libraries, streaming with Spark Streaming, external frameworks, and real‑world enterprise case studies—while also noting a promotional announcement for a React Native conference.

Big DataKafkaSpark
0 likes · 21 min read
Comprehensive Guide to Spark Ecosystem: Data Warehouse, Machine Learning, Streaming, and Enterprise Use Cases
ITPUB
ITPUB
Sep 22, 2017 · Big Data

How Baidu Waimai Scaled Traffic Analysis with Apache Kylin: A Deep Dive

This article presents a detailed case study of Baidu Waimai's traffic analysis platform, outlining the data challenges of high dimensionality and volume, the evaluation of OLAP engines, the adoption of Apache Kylin for pre‑computation, the end‑to‑end data modeling, cube construction, incremental builds, and integration with Saiku‑Mondrian reporting, while sharing practical lessons and performance gains.

Apache KylinBig DataOLAP
0 likes · 29 min read
How Baidu Waimai Scaled Traffic Analysis with Apache Kylin: A Deep Dive
Meituan Technology Team
Meituan Technology Team
Sep 21, 2017 · Big Data

Feature Production Scheduling: Architecture Evolution and Core Technologies

Using Meituan‑Dianping’s hospitality online feature system as a case study, the article describes how feature production scheduling evolved from offline batch ETL to automated, metadata‑driven pipelines and sub‑second streaming, detailing the underlying architecture, incremental updates, storage abstraction, write‑shaving, atomicity, and recovery mechanisms.

Big DataReal-time ProcessingSystem Architecture
0 likes · 23 min read
Feature Production Scheduling: Architecture Evolution and Core Technologies
Ctrip Technology
Ctrip Technology
Sep 20, 2017 · Big Data

Building a Real‑Time Computing Platform with Spark Streaming at Ctrip: Design, Implementation, and Lessons Learned

This article describes how Ctrip migrated its large‑scale real‑time platform from JStorm to Spark Streaming, detailing the architectural design, the Muise Spark Core encapsulation, operational metrics, encountered pitfalls, and future plans to adopt Flink and Beam for streaming workloads.

Big DataExactly-OnceSpark Streaming
0 likes · 22 min read
Building a Real‑Time Computing Platform with Spark Streaming at Ctrip: Design, Implementation, and Lessons Learned
Alibaba Cloud Developer
Alibaba Cloud Developer
Sep 19, 2017 · Artificial Intelligence

Inside Alibaba’s 2017 Tech Forum: AI, Big Data, and Cloud Innovations Unveiled

At the inaugural 2017 Alibaba Technology Forum held at Hong Kong University of Science and Technology, senior executives highlighted Alibaba’s cutting‑edge AI, machine learning, big‑data, and cloud breakthroughs, showcasing how data‑driven technologies power billions of users across e‑commerce, finance, logistics, healthcare, and entertainment.

Big DataCloud Computing
0 likes · 6 min read
Inside Alibaba’s 2017 Tech Forum: AI, Big Data, and Cloud Innovations Unveiled
MaGe Linux Operations
MaGe Linux Operations
Sep 11, 2017 · Big Data

How Big Data Can Revolutionize Operations Monitoring

This article explores applying big‑data thinking and platforms—such as Flume, Spark Streaming, and HBase—to operations monitoring, detailing data sources, metric categories, architecture design, implementation steps, and the benefits of a scalable, low‑code monitoring platform.

ArchitectureBig DataOperations
0 likes · 10 min read
How Big Data Can Revolutionize Operations Monitoring
21CTO
21CTO
Sep 5, 2017 · Big Data

Build a PHP Word Count with Hadoop MapReduce: Step-by-Step Guide

This article explains what MapReduce is, when to use it, and how to implement a PHP word‑count and a gold‑price average calculation on an Apache Hadoop cluster, covering installation hints, mapper and reducer scripts, testing commands, and visualizing results with gnuplot.

Big DataGnuplotHadoop
0 likes · 10 min read
Build a PHP Word Count with Hadoop MapReduce: Step-by-Step Guide
MaGe Linux Operations
MaGe Linux Operations
Sep 4, 2017 · Fundamentals

The Ultimate Technical Knowledge Map: 50+ Skill Charts for Architects & Developers

This article presents a comprehensive collection of technical knowledge maps compiled over years, covering architecture, Java, microservices, consistency, big data, cloud computing, mobile development, front‑end, back‑end, DevOps, and more, aiming to help engineers and architects master essential skills and best practices.

ArchitectureBig DataCloud Computing
0 likes · 6 min read
The Ultimate Technical Knowledge Map: 50+ Skill Charts for Architects & Developers
Tencent IMWeb Frontend Team
Tencent IMWeb Frontend Team
Sep 3, 2017 · Frontend Development

What’s Hot This Week in Web Tech? Apple Event, KSQL, Polymer 3, and More

This week’s IMWeb Frontend Community roundup highlights the Apple September event details, introduces KSQL for Apache Kafka, previews Polymer 3.0’s shift to ES6 modules, discusses the Ayo.js Node.js fork, ASP.NET Core 2 Razor pages, VS 2017 preview, container adoption trends, and Oracle’s cloud database innovations.

Big DataFrontendTechnology News
0 likes · 6 min read
What’s Hot This Week in Web Tech? Apple Event, KSQL, Polymer 3, and More
Architecture Digest
Architecture Digest
Sep 2, 2017 · Big Data

Designing a High‑Availability, High‑Efficiency Distributed Scheduling Platform for Big Data

This article examines the principles, features, and implementation details of distributed scheduling for big‑data ETL pipelines, covering decentralised schedulers, host selection strategies, fault‑tolerance, operator abstraction, elasticity, trigger mechanisms, visual monitoring, alarm handling, data fan‑in/fan‑out, parameter consistency, real‑time quality checks, lineage tracking, and field‑level traceability.

Big DataData LineageDistributed Scheduling
0 likes · 23 min read
Designing a High‑Availability, High‑Efficiency Distributed Scheduling Platform for Big Data
21CTO
21CTO
Aug 27, 2017 · Big Data

Uncovering Ghost Bikes: How to Crawl and Analyze Mobike Data in Chengdu

This article details the process of capturing Mobike's public API data, building a high‑performance Python crawler with proxy rotation, storing the results in databases, and performing large‑scale analysis to reveal stationary bikes, travel distances, usage frequency, and urban development patterns in Chengdu.

Big DataBike SharingMobike
0 likes · 13 min read
Uncovering Ghost Bikes: How to Crawl and Analyze Mobike Data in Chengdu
Meituan Technology Team
Meituan Technology Team
Aug 25, 2017 · Big Data

Data Platform Integration and Multi‑Data‑Center Architecture at Meituan‑Dianping

After Meituan merged with Dianping, engineers unified two massive Hadoop ecosystems across Beijing and Shanghai by breaking the project into four phases—unify, copy, switch, fuse—standardizing versions, implementing zone‑aware transfers, cross‑realm Kerberos, and federated metadata to achieve a single, reliable multi‑data‑center platform.

Big DataCluster FusionData Platform
0 likes · 32 min read
Data Platform Integration and Multi‑Data‑Center Architecture at Meituan‑Dianping
21CTO
21CTO
Aug 21, 2017 · Big Data

Rethinking Hadoop: When to Use It and How Cloud Computing Changes the Game

This article reviews when Hadoop is appropriate, outlines its core features and limitations, explains cloud computing concepts and service models, and highlights the benefits of pre‑built Hadoop images for accelerating big‑data projects.

Big DataHadoopPre-built Images
0 likes · 13 min read
Rethinking Hadoop: When to Use It and How Cloud Computing Changes the Game
Architecture Digest
Architecture Digest
Aug 15, 2017 · Artificial Intelligence

Why AI Engineers Must Understand Basic Infrastructure: From Big Data to Deep Learning

The article explains why AI engineers need foundational infrastructure knowledge—covering big‑data processing, cloud services, containerization, MapReduce, and deep‑learning platforms—to effectively solve real‑world problems, collaborate with teams, and build scalable, maintainable AI solutions.

AI InfrastructureBig DataCloud Computing
0 likes · 14 min read
Why AI Engineers Must Understand Basic Infrastructure: From Big Data to Deep Learning
21CTO
21CTO
Aug 14, 2017 · Big Data

Unveiling Flink’s Multi‑Layer Execution Graph: From StreamGraph to Physical Deployment

This article explains Flink’s architecture, detailing the roles of Client, JobManager and TaskManager, walks through a SocketTextStreamWordCount example, and clarifies the four‑layer graph model—StreamGraph, JobGraph, ExecutionGraph, and the physical execution graph—highlighting why each layer exists.

Big DataExecution GraphFlink
0 likes · 9 min read
Unveiling Flink’s Multi‑Layer Execution Graph: From StreamGraph to Physical Deployment
Alibaba Cloud Developer
Alibaba Cloud Developer
Aug 10, 2017 · Big Data

Alibaba’s HBase Innovations: Powering Big Data at Scale – HBaseCon 2017 Asia Insights

At HBaseCon 2017 Asia, Alibaba showcased a series of groundbreaking HBase enhancements—including strong synchronous replication, SQL-on-HBase capabilities, cross‑cluster range data copy, and read/write path optimizations—that dramatically improve performance, reliability, and usability for large‑scale big‑data storage.

Big DataHBasePerformance
0 likes · 10 min read
Alibaba’s HBase Innovations: Powering Big Data at Scale – HBaseCon 2017 Asia Insights
High Availability Architecture
High Availability Architecture
Aug 8, 2017 · Big Data

Practical Big Data Architecture Evolution and Lessons Learned

The article reviews the evolution of big‑data architectures from a simple RDB‑centric pipeline to a SaaS‑based solution, highlighting common bottlenecks such as scaling, integration, cost, and operational complexity, and shares practical experiences and best‑practice recommendations for building efficient, maintainable data platforms.

ArchitectureBig DataSaaS
0 likes · 12 min read
Practical Big Data Architecture Evolution and Lessons Learned
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Jul 26, 2017 · Big Data

Inside Taobao’s Massive Data Architecture: From Hadoop “Cloud Ladder” to Real‑Time “Galaxy”

This article details Taobao’s multi‑layer massive data platform, covering its five‑tier architecture, the 1500‑node Hadoop “Cloud Ladder” for batch processing, the low‑latency “Galaxy” stream engine, MySQL‑based MyFOX, HBase‑based Prom storage, the glider middle‑layer, and sophisticated caching strategies that together support petabytes of data and millions of daily queries.

Big DataDistributed SystemsHBase
0 likes · 16 min read
Inside Taobao’s Massive Data Architecture: From Hadoop “Cloud Ladder” to Real‑Time “Galaxy”
21CTO
21CTO
Jul 22, 2017 · Big Data

Why Every Company Needs a Chief Data Officer to Unlock Data Value

The article explains the strategic importance of the Chief Data Officer role, outlining how CDOs drive data‑driven innovation through a four‑stage data supply chain—data supply, logistics, science, and execution—to create competitive advantage and business growth.

Big DataChief Data OfficerData Governance
0 likes · 14 min read
Why Every Company Needs a Chief Data Officer to Unlock Data Value
Architecture Digest
Architecture Digest
Jul 22, 2017 · Big Data

Popular Big Data Tools and Their Descriptions

This article provides an extensive overview of more than ninety open‑source and commercial big‑data tools—including ETL platforms, resource managers, storage systems, messaging queues, processing engines, and visualization libraries—detailing their core functions, typical use cases, and notable adopters.

AnalyticsBig DataData Integration
0 likes · 26 min read
Popular Big Data Tools and Their Descriptions
High Availability Architecture
High Availability Architecture
Jul 19, 2017 · Artificial Intelligence

Weiflow: A Scalable Machine Learning Workflow Framework for Sina Weibo

The article introduces Weiflow, a dual‑layer DAG‑based machine‑learning workflow framework designed for Sina Weibo, and explains how its modular XML configuration, Scala implementation, and integration with Spark, TensorFlow, Hive, Storm, and Flink improve development efficiency, scalability, and execution performance across the entire ML pipeline.

Big DataDAGScala
0 likes · 16 min read
Weiflow: A Scalable Machine Learning Workflow Framework for Sina Weibo
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Jul 17, 2017 · Big Data

Mastering Data Sync, Real‑Time Processing, and Scalable Storage for Modern Systems

This article explores practical techniques for synchronizing heterogeneous data sources, performing batch and incremental analytics with Hadoop and Spark, designing low‑latency real‑time computation pipelines, implementing push notifications, and choosing appropriate storage solutions—from in‑memory caches to distributed databases—while addressing performance, reliability, and scalability challenges.

Big DataDistributed SystemsReal-time Processing
0 likes · 25 min read
Mastering Data Sync, Real‑Time Processing, and Scalable Storage for Modern Systems
Alibaba Cloud Developer
Alibaba Cloud Developer
Jul 17, 2017 · Artificial Intelligence

How Alibaba Turns Big Data into ‘Data New Energy’ with Automated Tagging and Distributed Knowledge Graphs

Alibaba's senior algorithm expert Yang Hongxia explains how the company fuses massive, heterogeneous data sources into a unified platform, builds automated tag‑production pipelines and large‑scale distributed knowledge graphs, and applies these technologies to drive smarter business decisions and AI‑enabled services.

AlibabaBig DataData Platform
0 likes · 14 min read
How Alibaba Turns Big Data into ‘Data New Energy’ with Automated Tagging and Distributed Knowledge Graphs
Efficient Ops
Efficient Ops
Jul 16, 2017 · Cloud Computing

Why PB‑Level Object Storage Is Essential and How to Choose the Right Solution

With data volumes soaring to petabyte scales, the article explains why object storage is the only viable solution for massive storage needs, outlines procurement considerations, design principles, and operational challenges, and offers practical guidance for building, evaluating, and scaling PB‑level storage systems.

Big DataCloud ComputingStorage Architecture
0 likes · 38 min read
Why PB‑Level Object Storage Is Essential and How to Choose the Right Solution
Architecture Digest
Architecture Digest
Jul 13, 2017 · Operations

Comprehensive Architecture and DevOps Tool Knowledge Map

This article compiles an extensive collection of architecture knowledge maps and a detailed overview of DevOps tools, categorizing them by development, deployment, and maintenance functions while also presenting related big‑data and cloud‑computing skill maps for engineers seeking a holistic view of modern software infrastructure.

ArchitectureBig DataCloud Computing
0 likes · 9 min read
Comprehensive Architecture and DevOps Tool Knowledge Map
High Availability Architecture
High Availability Architecture
Jul 12, 2017 · Artificial Intelligence

Machine Learning Platform and Risk‑Control Applications at DianRong Net

The article presents a comprehensive overview of DianRong Net's in‑house machine‑learning platform built on Spark, its workflow, pain points it addresses, risk‑control case studies using graph mining, and practical tips for improving model performance through data, algorithms, hyper‑parameter tuning and ensemble methods.

Big DataModel OptimizationSpark
0 likes · 14 min read
Machine Learning Platform and Risk‑Control Applications at DianRong Net
dbaplus Community
dbaplus Community
Jul 10, 2017 · Big Data

Master Apache Storm: Real‑Time Stream Processing from Basics to Word‑Count and Call‑Log Examples

This tutorial explains Apache Storm’s core principles, architecture, and development workflow, covering its relationship with Hadoop, key concepts such as spouts, bolts, tuples, and topologies, and provides step‑by‑step code examples for a word‑count program and a call‑log analysis application.

Apache StormBig DataReal-time Processing
0 likes · 14 min read
Master Apache Storm: Real‑Time Stream Processing from Basics to Word‑Count and Call‑Log Examples
21CTO
21CTO
Jul 7, 2017 · Big Data

How to Kickstart Your Big Data Career: A Complete Learning Roadmap

This guide walks beginners through the vast big data landscape, helping them choose the right role, understand essential terminology, plan a learning path, and access curated resources for becoming a data engineer or analyst, all illustrated with clear diagrams.

Big DataLearning Pathbig data technologies
0 likes · 16 min read
How to Kickstart Your Big Data Career: A Complete Learning Roadmap
Meituan Technology Team
Meituan Technology Team
Jul 6, 2017 · Backend Development

Online Feature System: Architecture, Storage, and High‑Concurrency Techniques

Using Meituan’s hotel‑travel platform as a case study, the article details a scalable online feature system architecture that combines layered storage, efficient compression, and robust synchronization to meet extreme concurrency, throughput, terabyte‑scale data, and sub‑10 ms latency demands for AI‑driven strategy services.

Big Datadata compressiondistributed storage
0 likes · 23 min read
Online Feature System: Architecture, Storage, and High‑Concurrency Techniques
Alibaba Cloud Developer
Alibaba Cloud Developer
Jul 5, 2017 · Artificial Intelligence

Is This the New Golden Age of Visual AI? Insights from Alibaba Cloud

The article reviews the three historic AI booms, explains why today’s cloud‑based visual intelligence represents a distinct era, outlines five key factors for successful visual AI, and showcases real‑world Alibaba Cloud applications such as product search, city‑wide monitoring, medical diagnosis, and visual advertising.

AI applicationsAlibaba CloudBig Data
0 likes · 18 min read
Is This the New Golden Age of Visual AI? Insights from Alibaba Cloud
Tencent Advertising Technology
Tencent Advertising Technology
Jul 3, 2017 · Artificial Intelligence

Tencent Social Advertising College Algorithm Contest

Tencent's social advertising team hosts an algorithm contest for college students, leveraging big data and machine learning to develop innovative solutions for social advertising scenarios, inviting participants to submit algorithmic approaches to real-world advertising challenges.

Academic CompetitionAlgorithm ContestBig Data
0 likes · 2 min read
Tencent Social Advertising College Algorithm Contest
21CTO
21CTO
Jul 3, 2017 · Big Data

Inside the World’s Best Data Architectures: Netflix, Facebook, Airbnb, Pinterest

This article explores the cutting‑edge data pipelines of Netflix, Facebook, Airbnb and Pinterest, detailing the massive event volumes they handle, the core technologies such as Kafka, Spark, Presto and Hadoop, and how these giants design scalable, real‑time analytics infrastructures.

AirbnbBig DataData Architecture
0 likes · 6 min read
Inside the World’s Best Data Architectures: Netflix, Facebook, Airbnb, Pinterest
21CTO
21CTO
Jul 1, 2017 · Operations

How Ctrip Scales Its Architecture: Ops, Release, and Big Data Insights

This article outlines Ctrip’s evolving architecture—covering its operational backbone, framework components, release system, configuration management, SOA evolution, and the massive UserProfile big‑data platform—offering practical insights from a senior developer on how the company achieves high availability and scalability.

ArchitectureBig DataOperations
0 likes · 12 min read
How Ctrip Scales Its Architecture: Ops, Release, and Big Data Insights
Java High-Performance Architecture
Java High-Performance Architecture
Jun 29, 2017 · Big Data

Master Apache Storm: Core Concepts, Real‑Time Word Count & Call Log Analytics

This tutorial introduces Apache Storm’s fundamental principles and development workflow, providing a PDF guide and source code for two practical examples—real‑time word‑count and call‑record aggregation—while covering its definition, use cases, relationship with Hadoop, core concepts, cluster architecture, and step‑by‑step usage.

Apache StormBig DataReal-time Processing
0 likes · 1 min read
Master Apache Storm: Core Concepts, Real‑Time Word Count & Call Log Analytics
Efficient Ops
Efficient Ops
Jun 27, 2017 · Big Data

How a Leading Bank Evolved Its Big Data Platform Architecture

This talk outlines how China’s Guangfa Bank built, refined, and scaled its big‑data platform since 2014, covering data positioning, system architecture optimization, delivery model improvements, team restructuring, and real‑world use cases that demonstrate the platform’s impact on risk control, marketing and operational efficiency.

BankingBig DataMicroservices
0 likes · 14 min read
How a Leading Bank Evolved Its Big Data Platform Architecture
21CTO
21CTO
Jun 20, 2017 · Artificial Intelligence

How Toutiao’s AI Powers Personalized News Recommendations

This article examines Toutiao’s rapid rise as a personalized news platform, detailing its AI‑driven recommendation pipeline, web‑crawling infrastructure, similarity‑matrix algorithms, A/B testing, and the role of human moderation in delivering highly targeted content to billions of users.

A/B testingAIBig Data
0 likes · 16 min read
How Toutiao’s AI Powers Personalized News Recommendations
Alibaba Cloud Developer
Alibaba Cloud Developer
Jun 19, 2017 · Cloud Computing

How Alibaba Built a Cloud‑Native HR System That Cut Costs 100× and Boosted Speed 6×

This article details Alibaba's migration from Oracle PeopleSoft HCM to a self‑developed, cloud‑native eHR platform, describing the technical challenges, phased development using Groovy and MaxCompute, and the resulting six‑fold speed increase, hundred‑fold cost reduction, and enhanced employee experience.

Big DataCloud ComputingGroovy
0 likes · 11 min read
How Alibaba Built a Cloud‑Native HR System That Cut Costs 100× and Boosted Speed 6×
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Jun 18, 2017 · Cloud Computing

Inside Alibaba’s Middleware: Career Paths, Tech Stack, and Architecture Challenges

This article explores why Alibaba's middleware is dubbed the architect's cradle, outlines career development routes within the team, details the extensive technology stack, and examines the major technical challenges such as massive data processing, real‑time analytics, and large‑scale deployment during peak events.

Big DataCareer DevelopmentCloud Computing
0 likes · 25 min read
Inside Alibaba’s Middleware: Career Paths, Tech Stack, and Architecture Challenges
StarRing Big Data Open Lab
StarRing Big Data Open Lab
Jun 16, 2017 · Big Data

How TDH Dominated the TPCx‑HS 10TB Benchmark: Strategies and Results

The article details how StarRocks and Cisco’s joint TPCx‑HS 10TB benchmark placed the TDH platform at the top of the performance ranking, explains the test setup, describes the pre‑ and post‑optimization strategies for TeraGen and TeraSort, and outlines the hardware configuration and key tuning parameters.

Big DataHadoopPerformance Optimization
0 likes · 10 min read
How TDH Dominated the TPCx‑HS 10TB Benchmark: Strategies and Results
Ctrip Technology
Ctrip Technology
Jun 13, 2017 · Operations

Evolution and Architecture of Ctrip's System: Operations, Frameworks, and Big Data

This article presents a comprehensive overview of Ctrip's evolving system architecture, detailing its operational strategies, framework components such as SOA and release systems, and the large‑scale UserProfile big‑data platform, illustrating how each iteration addressed prior challenges while introducing new capabilities.

Big DataCtripOperations
0 likes · 13 min read
Evolution and Architecture of Ctrip's System: Operations, Frameworks, and Big Data
21CTO
21CTO
Jun 9, 2017 · Big Data

From Hadoop to Spark: A Complete Roadmap to Becoming a Big Data Architect

This guide walks beginners through the essential big‑data ecosystem—from understanding Hadoop’s core components and mastering MapReduce, to using Hive, SparkSQL, Kafka, and real‑time frameworks like Storm, while also covering data ingestion, export, scheduling, and introductory machine‑learning techniques.

Big DataSparkdata engineering
0 likes · 20 min read
From Hadoop to Spark: A Complete Roadmap to Becoming a Big Data Architect
Suning Technology
Suning Technology
Jun 9, 2017 · Big Data

How Suning’s AI‑Powered Smart Replenishment Turns Retail from B2C to C2B

Suning’s smart replenishment system showcased at CES Asia 2017 leverages big‑data analytics and machine‑learning models—linear regression, random forest, and XGBoost—to predict sales, optimize inventory across multiple warehouses, and shift retail from traditional B2C to a data‑driven C2B approach.

Big Datainventory optimizationmachine learning
0 likes · 5 min read
How Suning’s AI‑Powered Smart Replenishment Turns Retail from B2C to C2B
StarRing Big Data Open Lab
StarRing Big Data Open Lab
May 27, 2017 · Big Data

Simplify Big Data Governance with Data Lineage & Impact Analysis

Enterprise big‑data platforms face massive scale and complex metadata relationships, but using Transwarp Governor’s data lineage and impact analysis graphs enables precise tracing of data origins, rapid error localization, and prediction of downstream effects, dramatically improving data quality and governance efficiency.

Big DataData GovernanceData Lineage
0 likes · 8 min read
Simplify Big Data Governance with Data Lineage & Impact Analysis
MaGe Linux Operations
MaGe Linux Operations
May 26, 2017 · Big Data

How Big Data Transforms Everyday Life: From Finance to Healthcare

This article explains what big data is, outlines its 5V characteristics, and showcases numerous real‑world applications such as personal finance monitoring, tax fraud detection, healthcare prediction, public opinion tracking, precise marketing, product development, traffic planning, strategic decision‑making, and credit scoring.

ApplicationsBig DataData Analytics
0 likes · 4 min read
How Big Data Transforms Everyday Life: From Finance to Healthcare
Architecture Digest
Architecture Digest
May 25, 2017 · Big Data

Designing Data Warehouse Layers: Principles, Models, and Practical Practices

This article explains why data warehouses should be layered, describes the classic ODS‑DW‑APP model, details each layer’s purpose and implementation techniques, presents an improved layering scheme with dimension and temporary tables, and answers common questions about parallel DWS and DWD processing.

Big DataData ArchitectureETL
0 likes · 17 min read
Designing Data Warehouse Layers: Principles, Models, and Practical Practices
Alibaba Cloud Developer
Alibaba Cloud Developer
May 25, 2017 · Big Data

How Alibaba’s Blink Engine Redefines Real‑Time Big Data Processing

This article explains how Alibaba’s Blink, built on Apache Flink, transforms batch‑oriented big‑data platforms into a unified, high‑performance real‑time computing engine, detailing its architecture, state management, checkpointing, and successful deployment in e‑commerce, search, recommendation, and online machine‑learning scenarios.

AlibabaBig DataFlink
0 likes · 17 min read
How Alibaba’s Blink Engine Redefines Real‑Time Big Data Processing
Alibaba Cloud Developer
Alibaba Cloud Developer
May 20, 2017 · Artificial Intelligence

How Alibaba’s AI‑Driven Information Retrieval Is Shaping E‑Commerce Futures

The second “Frontiers and Future of Information Retrieval” forum, co‑hosted by the Chinese Computer Society, Alibaba and academic committees, showcased how massive, structured e‑commerce data and AI algorithms are revolutionizing search, customer service, and research collaborations across the industry.

AlibabaBig Datae‑commerce
0 likes · 4 min read
How Alibaba’s AI‑Driven Information Retrieval Is Shaping E‑Commerce Futures
Alibaba Cloud Developer
Alibaba Cloud Developer
May 17, 2017 · Databases

How Alibaba Tackles the Massive Challenges of Time‑Series Data Storage

This article details Alibaba's middleware team's exploration of time‑series data characteristics, real‑world monitoring scenarios, the limitations of traditional databases, and the evolution of their custom HiTSDB solution that combines inverted indexing, high‑compression algorithms, and distributed aggregation to meet massive write and query demands.

AlibabaBig DataHiTSDB
0 likes · 25 min read
How Alibaba Tackles the Massive Challenges of Time‑Series Data Storage
MaGe Linux Operations
MaGe Linux Operations
May 17, 2017 · Big Data

How Big Data Turns Raw Information into Resource Optimization

The article explains that the ultimate value of big data lies in optimizing resource allocation by first crowdsourcing massive data, then fully mining it to uncover truth, and finally using those insights across industries such as transportation, advertising, finance, and more.

Big DataResource Optimizationcrowdsourcing
0 likes · 7 min read
How Big Data Turns Raw Information into Resource Optimization
Baidu Waimai Technology Team
Baidu Waimai Technology Team
May 16, 2017 · Big Data

Analysis of OLTP/OLAP Integrated Solutions: Apache Phoenix, Apache Trafodion, and Splice Machine

This article examines the convergence of OLTP and OLAP by introducing Apache Phoenix, Apache Trafodion, and Splice Machine, compares their technical features, and describes how Baidu Waimai adopted a Phoenix‑based solution to address scalability and performance challenges in its operational data store.

Apache PhoenixApache TrafodionBig Data
0 likes · 12 min read
Analysis of OLTP/OLAP Integrated Solutions: Apache Phoenix, Apache Trafodion, and Splice Machine
Qunar Tech Salon
Qunar Tech Salon
May 16, 2017 · Artificial Intelligence

Personalized Recommendation Systems: Applications, User Profiling, Algorithms, and Optimization

This article presents a comprehensive overview of personalized recommendation systems, covering their application scenarios and value, user profiling, core algorithms such as content‑based and collaborative filtering, system architecture, performance and effect optimization techniques, and practical Q&A insights.

AIBig Datacollaborative filtering
0 likes · 18 min read
Personalized Recommendation Systems: Applications, User Profiling, Algorithms, and Optimization
MaGe Linux Operations
MaGe Linux Operations
May 15, 2017 · Databases

Top 10 Must‑Know Data Storage Tools for Java Developers

Facing ever‑growing complexity, Java developers can streamline their projects by mastering a curated list of essential data storage and processing tools—including MongoDB, Elasticsearch, Cassandra, Redis, Hazelcast, EHCache, Hadoop, Solr, Spark, and Memcached—each offering distinct strengths for modern big‑data applications.

Big DataNoSQLdata-processing
0 likes · 8 min read
Top 10 Must‑Know Data Storage Tools for Java Developers
Architecture Digest
Architecture Digest
May 14, 2017 · Big Data

Handling Transactions, Failover, and Exactly‑Once Semantics in Distributed Systems

This article explores practical techniques for handling node liveness, failover, recovery, and exactly‑once transaction semantics in distributed systems, illustrating implementations with Zookeeper, Kafka, Storm, and database sharding while addressing big‑data reach calculations and performance trade‑offs.

Big DataDistributed SystemsExactly-Once
0 likes · 15 min read
Handling Transactions, Failover, and Exactly‑Once Semantics in Distributed Systems
ITPUB
ITPUB
May 8, 2017 · Big Data

Master Spark Performance: Practical Tuning Tips and Real‑World Examples

This article explains essential Spark concepts, illustrates common performance bottlenecks, and provides concrete tuning strategies for memory, CPU, serialization, data locality, file I/O, and shuffle reduction, backed by real‑world examples and visual metrics.

Big DataCPU optimizationMemory Management
0 likes · 19 min read
Master Spark Performance: Practical Tuning Tips and Real‑World Examples
MaGe Linux Operations
MaGe Linux Operations
May 7, 2017 · Artificial Intelligence

Big Data & Machine Learning: Core Definitions and Essential Algorithms

This article explains what big data and machine learning are, their interrelationship, various big‑data analysis approaches, core machine‑learning concepts, and details ten fundamental algorithms—including regression, neural networks, SVM, clustering, dimensionality reduction, and recommendation—while highlighting their roles in modern data‑driven applications.

Big DataNeural Networksclustering
0 likes · 24 min read
Big Data & Machine Learning: Core Definitions and Essential Algorithms
MaGe Linux Operations
MaGe Linux Operations
May 4, 2017 · Big Data

How to Process 100GB Logs and Massive Datasets with Hash Partitioning and Bloom Filters

This article explains the definition and 4V characteristics of big data and presents practical algorithms—including hash partitioning, min‑heap top‑K selection, bitmap extensions, and Bloom filter techniques—to efficiently handle ultra‑large log files, integer sets, and keyword searches within strict memory limits.

Big DataBitmapHash Partitioning
0 likes · 12 min read
How to Process 100GB Logs and Massive Datasets with Hash Partitioning and Bloom Filters
Efficient Ops
Efficient Ops
May 3, 2017 · Operations

How Tencent Scales NBA Live Streams to Millions: Behind the Tech and Operations

This article details Tencent's large‑scale live streaming architecture for NBA games, covering the rapid growth of live video, key technical features, network transmission challenges, multi‑angle production, CDN deployment, monitoring, big‑data processing, and strategies for ensuring low latency and high reliability for millions of concurrent viewers.

Big DataCDNOperations
0 likes · 25 min read
How Tencent Scales NBA Live Streams to Millions: Behind the Tech and Operations
Baidu Waimai Technology Team
Baidu Waimai Technology Team
Apr 28, 2017 · Big Data

Recap of Baidu Waimai Tech Team’s “Code Talk” Session on Data Platform Architecture and Big Data Practices

The article summarizes Baidu Waimai’s recent “Code Talk” event, highlighting the speaker’s overview of the company’s big‑data platform evolution, its technical architecture, practical challenges such as data security and accuracy, and a lively Q&A covering storm, high availability, and metric management.

Baidu WaimaiBig DataData Platform
0 likes · 6 min read
Recap of Baidu Waimai Tech Team’s “Code Talk” Session on Data Platform Architecture and Big Data Practices
Architects' Tech Alliance
Architects' Tech Alliance
Apr 27, 2017 · Big Data

Curated List of Big Data Learning Resources from w3cschool

This article presents a comprehensive, Chinese‑language collection of big‑data resources—including relational databases, distributed file systems, key‑value stores, distributed programming tools, file data models, and key‑map frameworks—compiled by w3cschool to help programmers deepen their understanding of big data technologies.

Big DataDistributed SystemsResources
0 likes · 6 min read
Curated List of Big Data Learning Resources from w3cschool
Architecture Digest
Architecture Digest
Apr 24, 2017 · Big Data

Understanding and Solving Data Skew in Hadoop and Spark

This article explains what data skew is, why it occurs in large‑scale Hadoop and Spark jobs, illustrates typical symptoms, and presents practical strategies—including business‑level adjustments, code tweaks, and platform‑specific tuning—to mitigate and resolve skew in big‑data processing.

Big DataData SkewHadoop
0 likes · 11 min read
Understanding and Solving Data Skew in Hadoop and Spark
21CTO
21CTO
Apr 21, 2017 · R&D Management

How to Turn Technical Experience into Personal Value: Lessons from Outsourcing to Big Data

The author shares a candid journey from low‑paid outsourcing coding to senior roles in design, analysis, and big‑data architecture, revealing how understanding value networks, leveraging cloud and data trends, and expanding beyond pure coding can dramatically increase a technologist’s personal and market value.

Big DataCareer DevelopmentCloud Computing
0 likes · 34 min read
How to Turn Technical Experience into Personal Value: Lessons from Outsourcing to Big Data
Alibaba Cloud Developer
Alibaba Cloud Developer
Apr 21, 2017 · Big Data

How Alibaba Tackles Real-Time Stream and Graph Computing at Scale

In his ASPLOS keynote, Alibaba’s Vice President Zhou Jingren detailed the company’s large‑scale stream and graph computing platforms, highlighting fault‑tolerance innovations, real‑time data challenges, and upcoming advances in graph analytics and massive machine‑learning workloads.

AIAlibabaBig Data
0 likes · 7 min read
How Alibaba Tackles Real-Time Stream and Graph Computing at Scale
Baidu Waimai Technology Team
Baidu Waimai Technology Team
Apr 20, 2017 · Databases

Greenplum (GPDB) Architecture, Features, and Operational Tools Overview

This article explains Greenplum's MPP architecture, master‑segment design, high‑availability, interconnect network, rich management tools, parallel query planning, data loading techniques, and additional capabilities such as LDAP authentication and resource queues, demonstrating why it is a strong next‑generation big‑data query engine.

Big DataGreenplumMPP
0 likes · 15 min read
Greenplum (GPDB) Architecture, Features, and Operational Tools Overview
Baidu Waimai Technology Team
Baidu Waimai Technology Team
Apr 18, 2017 · Industry Insights

Baidu Waimai’s Cloud Migration, AI Logistics, and Architecture – QCon 2017

At QCon Beijing 2017, three senior Baidu Waimai engineers detailed the company’s year‑long migration from IDC to cloud using custom operation platforms, described the AI‑driven, data‑rich logistics scheduling system that outperforms manual dispatch, and shared architectural evolutions that enabled rapid, zero‑downtime scaling of the fast‑growing delivery business.

AI logisticsBig DataOperations
0 likes · 5 min read
Baidu Waimai’s Cloud Migration, AI Logistics, and Architecture – QCon 2017
Meituan Technology Team
Meituan Technology Team
Apr 14, 2017 · Big Data

Practical Experience of HDFS Federation at Meituan: Challenges, Improvements, and Automation

Meituan‑Dianping migrated its 2,000‑node HDFS cluster to Federation by fixing ViewFs compatibility, simplifying mount points, leveraging FastCopy for massive data moves, improving token handling, and automating split‑workflow steps, thereby overcoming single‑NameNode bottlenecks and providing a practical blueprint for large‑scale Hadoop deployments.

Big DataFastCopyFederation
0 likes · 22 min read
Practical Experience of HDFS Federation at Meituan: Challenges, Improvements, and Automation
MaGe Linux Operations
MaGe Linux Operations
Apr 13, 2017 · Big Data

How to Choose the Right Language for Your Big Data Project

This article compares R, Python, Scala, and Java for big‑data projects, outlining each language’s strengths and weaknesses, and offers guidance on selecting the most suitable language based on project requirements, team expertise, and production needs.

Big DataPythonR
0 likes · 8 min read
How to Choose the Right Language for Your Big Data Project
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Apr 9, 2017 · Fundamentals

Understanding Bloom Filters: Fast, Space-Efficient Membership Tests

Bloom filters are highly space-efficient probabilistic data structures that quickly test set membership using multiple hash functions, guaranteeing no false negatives while allowing a small false positive rate, making them ideal for large-scale applications such as email blacklists and massive URL deduplication.

Big Databloom-filtermembership testing
0 likes · 5 min read
Understanding Bloom Filters: Fast, Space-Efficient Membership Tests
21CTO
21CTO
Apr 4, 2017 · Artificial Intelligence

How Vipshop Evolved Its Real-Time Personalized Recommendation Engine

This article recounts Wu Guanlin’s presentation on the evolution of Vipshop’s personalized recommendation system, detailing the technical challenges of real‑time predictions, the three generations of architecture, the four‑stage recommendation engine, and the VRE platform’s design for scalability and low latency.

Big DataSystem Architecturemachine learning
0 likes · 10 min read
How Vipshop Evolved Its Real-Time Personalized Recommendation Engine