Tagged articles
3672 articles
Page 35 of 37
ITPUB
ITPUB
Mar 22, 2017 · Big Data

Why Spark Beats MapReduce: The RDD Story and Spark SQL Evolution

This article walks through Spark’s origins, its core RDD concept, how it improves on Hadoop’s MapReduce, the role of in‑memory processing, functional programming support, and the emergence of Spark SQL with DataFrames and the Catalyst optimizer.

Big DataMapReduceRDD
0 likes · 25 min read
Why Spark Beats MapReduce: The RDD Story and Spark SQL Evolution
StarRing Big Data Open Lab
StarRing Big Data Open Lab
Mar 21, 2017 · Big Data

How Real-Time Data Streaming Is Transforming Industries Today

This article explains how real‑time data streaming turns massive, continuously growing datasets into actionable insights across finance, energy, and e‑commerce, showcasing early adopters like ConocoPhillips and DHL while urging businesses to rethink models for the next wave of data management.

Big DataData StreamingReal-time analytics
0 likes · 7 min read
How Real-Time Data Streaming Is Transforming Industries Today
Qunar Tech Salon
Qunar Tech Salon
Mar 12, 2017 · Big Data

Essential Skills and Career Paths for Data Professionals: From Big Data Platforms to AI

The article outlines the key competencies, responsibilities, and career development advice for data professionals across the entire data stack—from building big‑data platforms and data warehouses to visualization, analysis, algorithm engineering, and deep‑learning applications—emphasizing the importance of creating business value with data.

Big DataData AnalystData Warehouse
0 likes · 15 min read
Essential Skills and Career Paths for Data Professionals: From Big Data Platforms to AI
21CTO
21CTO
Mar 10, 2017 · Big Data

Inside Tencent Analytics: How TA Handles TB‑Scale Real‑Time Web Data

Tencent Analytics (TA) is a free web analytics platform that processes terabytes of daily data in real time, using a custom architecture featuring JavaScript collection, event streaming, in‑memory computation, and NoSQL storage with Redis and LevelDB, offering site owners instant insights and high availability.

Big DataLevelDBReal-time Processing
0 likes · 12 min read
Inside Tencent Analytics: How TA Handles TB‑Scale Real‑Time Web Data
Efficient Ops
Efficient Ops
Mar 7, 2017 · Big Data

How Tencent Scaled Its TDW to 8,800 Nodes and Mastered Cross-City Data Migration

Tencent’s senior engineer explains how the TDW (Tencent Distributed Data Warehouse) grew from a few hundred to thousands of nodes, the challenges of cross‑city migration, and the modeling, relationship‑chain, dual‑write tables, and platform strategies they built to ensure seamless, low‑impact data and task migration.

Big DataData MigrationTDW
0 likes · 26 min read
How Tencent Scaled Its TDW to 8,800 Nodes and Mastered Cross-City Data Migration
Alibaba Cloud Developer
Alibaba Cloud Developer
Mar 7, 2017 · Big Data

Unified Data Platforms: How UMENG+ Redefines Big Data Strategy

The article explores the evolution of big‑data applications in China, from Oracle’s trend report and the concept of "omni‑domain data" to UMENG+’s technical architecture, unified tech stack, AI integration, and future directions for delivering real customer value.

Big DataData AnalyticsData Integration
0 likes · 12 min read
Unified Data Platforms: How UMENG+ Redefines Big Data Strategy
Efficient Ops
Efficient Ops
Mar 6, 2017 · Operations

Tencent Game Ops: Turning Service Delivery into Smart, Automated Microservices

This article details how Tencent's game operations team redefined operational services, introduced micro‑service architecture, applied big‑data driven recommendations, and built intelligent, automated pipelines for server opening, merging, version releases, and download services, achieving significant efficiency and cost gains.

AutomationBig DataCloud Native
0 likes · 26 min read
Tencent Game Ops: Turning Service Delivery into Smart, Automated Microservices
Meituan Technology Team
Meituan Technology Team
Mar 2, 2017 · Big Data

Meituan Waimai Feature Archive Platform: Architecture, Tag System, and Data Processing

Meituan Waimai’s Feature Archive platform processes billions of daily orders by managing ~200 user and 400 merchant tags through a three‑layer architecture—Hive, Elasticsearch, HBase, and MySQL—offering visual tag selection, instant self‑service queries, full data extraction, and a predicate‑logic query language, while supporting future extensibility.

Big DataElasticsearchHBase
0 likes · 14 min read
Meituan Waimai Feature Archive Platform: Architecture, Tag System, and Data Processing
AntTech
AntTech
Feb 28, 2017 · Artificial Intelligence

Key Computing Capabilities Driving the Evolution of Digital Financial Services

The talk outlines nine essential computing capabilities—transaction processing, system robustness, connectivity, decision-making, data insight, intelligent services, biometric authentication, blockchain trust, and immersive integration—that have transformed Ant Financial over the past decade and outlines the challenges and strategies for the next ten years.

Big DataBlockchainFinTech
0 likes · 16 min read
Key Computing Capabilities Driving the Evolution of Digital Financial Services
Architecture Digest
Architecture Digest
Feb 28, 2017 · Big Data

Architecture and Real‑Time Processing Design of Tencent Analytics (TA)

This article explains the architecture, real‑time computation framework, and storage solutions of Tencent Analytics, detailing how massive TB‑level web‑traffic data are collected via JavaScript, processed in memory‑centric streaming components, and stored using Redis and LevelDB to achieve second‑level updates.

Big DataLevelDBNoSQL
0 likes · 13 min read
Architecture and Real‑Time Processing Design of Tencent Analytics (TA)
Nightwalker Tech
Nightwalker Tech
Feb 27, 2017 · Big Data

Community Discussion on Learning Paths, Tools, and Applications in Big Data

A diverse group of practitioners share recommendations for books, technologies, real‑world use cases, and practical challenges when learning and applying big‑data processing, covering Hadoop, Spark, data visualization, ETL, and the relationship between data, algorithms, and business value.

Big DataHadoopdata analysis
0 likes · 16 min read
Community Discussion on Learning Paths, Tools, and Applications in Big Data
Qunar Tech Salon
Qunar Tech Salon
Feb 26, 2017 · Big Data

Comparative Analysis of Big Data Storage and Query Solutions

This article reviews major big‑data storage and query architectures—including HBase, Dremel/Parquet, pre‑aggregation systems, Lucene, and the custom Tindex solution—evaluating their strengths, weaknesses, and suitability for real‑time, high‑volume analytical workloads.

Big DataHBaseParquet
0 likes · 20 min read
Comparative Analysis of Big Data Storage and Query Solutions
Efficient Ops
Efficient Ops
Feb 26, 2017 · Operations

How Alibaba Scales Massive Data Platforms: Lessons in Automated Operations

This article explores the challenges of operating Alibaba's large‑scale data platforms, describes the automation platform built to address them, and shares data‑driven, fine‑grained operational practices that enable stable, efficient, and cost‑effective service delivery.

AutomationBig DataOperations
0 likes · 22 min read
How Alibaba Scales Massive Data Platforms: Lessons in Automated Operations
Qunar Tech Salon
Qunar Tech Salon
Feb 22, 2017 · Big Data

Understanding Ctrip Flight Ticket Tracking System (UBT) and Its Key Metrics

This article explains Ctrip's flight ticket tracking framework (UBT), detailing client‑side and server‑side event collection methods, the purpose and trade‑offs of each tracking type, metric definitions, data association challenges, common pitfalls, and best practices for reliable data‑driven analysis.

AnalyticsBig DataCtrip
0 likes · 20 min read
Understanding Ctrip Flight Ticket Tracking System (UBT) and Its Key Metrics
Alibaba Cloud Developer
Alibaba Cloud Developer
Feb 22, 2017 · Artificial Intelligence

How Alibaba’s AI Powers Real‑Time Customer Segmentation and Personalized Shopping

This article explains how Alibaba leverages AI, big‑data analytics, and advanced recommendation algorithms to enable real‑time visitor clustering, personalized storefronts, and tailored content across its Customer Operation Platform, Double 11 promotion pages, QianNiu headlines, and service market, delivering significant conversion and engagement gains.

AIBig DataRecommendation Systems
0 likes · 18 min read
How Alibaba’s AI Powers Real‑Time Customer Segmentation and Personalized Shopping
Nightwalker Tech
Nightwalker Tech
Feb 20, 2017 · Backend Development

Career Development and Technology Trends for PHP Engineers

The discussion explores how PHP engineers can advance their careers by embracing new technologies such as Go, Python, big data, AI, and cloud computing, while also emphasizing soft‑skill growth, project management, and strategic decision‑making based on business trends and personal goals.

BackendBig DataCareer Development
0 likes · 9 min read
Career Development and Technology Trends for PHP Engineers
Meituan Technology Team
Meituan Technology Team
Feb 17, 2017 · Big Data

User Profiling and Machine Learning Practices for Food Delivery O2O Platforms

Meituan Delivery’s rapid expansion across multiple categories relies on detailed user profiling and machine‑learning models—such as high‑potential customer prediction, churn risk regression and Cox survival analysis—to personalize acquisition, retention, and scenario‑based cross‑selling, while addressing sparse behavior, unstructured data, and geographic context challenges.

Big DataO2Ochurn prediction
0 likes · 13 min read
User Profiling and Machine Learning Practices for Food Delivery O2O Platforms
21CTO
21CTO
Feb 15, 2017 · Fundamentals

How Twitter Evolved Its Search Engine: From MySQL to Earlybird and Beyond

This article explains the fundamentals of search engine architecture, covering text collection, indexing, ranking and evaluation, and then traces Twitter's internal search evolution from MySQL full‑text search to the Earlybird index server, Blender aggregation, and smart memory‑SSD strategies.

Big DataTwitterindexing
0 likes · 8 min read
How Twitter Evolved Its Search Engine: From MySQL to Earlybird and Beyond
Architecture Digest
Architecture Digest
Feb 11, 2017 · Big Data

LeKe Sports Big Data Platform Evolution: From Early ETL Reporting to 2.0 Streaming Architecture

The article describes how LeKe Sports built and continuously upgraded its Hadoop‑based big data platform—from a manual ETL‑to‑Elasticsearch reporting system to a 2.0 architecture featuring Spark Streaming, SQL‑based query layers, Elasticsearch indexing, and cloud‑native storage and backup solutions—to meet rapidly growing PB‑scale data demands.

Big DataData PlatformETL
0 likes · 5 min read
LeKe Sports Big Data Platform Evolution: From Early ETL Reporting to 2.0 Streaming Architecture
Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
Feb 7, 2017 · Big Data

What’s New in Apache CarbonData 1.0.0? 80+ Features Boost Big Data Performance

Apache CarbonData 1.0.0, now an Apache incubating project, adds over 80 new features and bug fixes—including a new data loading solution, Spark 2.1 integration, update/delete SQL support, adaptive compression for numeric types, B‑Tree LRU cache, V2 format for faster first‑query performance, vectorized reader, bucket‑table joins, off‑heap memory, single‑pass loading, and pre‑generated dictionaries—aimed at delivering faster, more flexible, and efficient columnar storage for big‑data workloads.

Apache CarbonDataBig DataColumnar Storage
0 likes · 8 min read
What’s New in Apache CarbonData 1.0.0? 80+ Features Boost Big Data Performance
Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
Jan 24, 2017 · Big Data

Why Hadoop Remains the Backbone of Big Data: Core Modules, Tools, and Trends

This article provides a comprehensive overview of Hadoop as the leading open‑source platform for big‑data processing, detailing its core components HDFS and MapReduce, the evolution to Hadoop 2.0/YARN, and the extensive ecosystem of tools and commercial solutions that enable scalable storage, analysis, and machine‑learning on massive data sets.

Big DataHDFSHadoop
0 likes · 18 min read
Why Hadoop Remains the Backbone of Big Data: Core Modules, Tools, and Trends
21CTO
21CTO
Jan 18, 2017 · Big Data

Build a Lightweight, High‑Availability Real‑Time Stream Processing System

Learn how to construct a simple, high‑availability real‑time stream processing platform using lightweight components such as Kafka, Zookeeper, Thrift/Avro, and optional storage like MongoDB or Elasticsearch, offering a practical alternative to heavyweight frameworks like Storm and Spark Streaming for small‑to‑medium enterprises.

Big DataKafkaReal-Time
0 likes · 5 min read
Build a Lightweight, High‑Availability Real‑Time Stream Processing System
dbaplus Community
dbaplus Community
Jan 16, 2017 · Backend Development

Scaling a FinTech Platform to $100B Transactions with Four Overhauls

Over three years, a small fintech company transformed its platform from a single‑server PHP/Java stack to a micro‑service‑based Spring Cloud architecture, undergoing four major upgrades that introduced distributed systems, SOA governance, big‑data pipelines, MongoDB replication, Redis caching, and open‑source tools, enabling transaction volumes exceeding one hundred billion.

Big DataFinTechMicroservices
1 likes · 15 min read
Scaling a FinTech Platform to $100B Transactions with Four Overhauls
Alibaba Cloud Developer
Alibaba Cloud Developer
Jan 11, 2017 · R&D Management

How Taobao’s Beehive Platform Powers Content‑Driven Shopping During Double 11

The article explains how Taobao’s content‑centric strategy, embodied in the Beehive platform, builds an end‑to‑end content chain—from creator tools and health scoring to personalized distribution and commerce mechanisms—enabling massive, efficient content production and monetization during the Double 11 shopping festival.

Big DataTaobaocontent platform
0 likes · 17 min read
How Taobao’s Beehive Platform Powers Content‑Driven Shopping During Double 11
Alibaba Cloud Developer
Alibaba Cloud Developer
Jan 9, 2017 · Big Data

How Alibaba Scaled Real‑Time Data Processing for Double 11: Architecture & Lessons

This article details Alibaba's real‑time computing architecture for the 2016 Double 11 event, covering background, core components such as DRC, TT, Galaxy, OTS, XTool and OneService, and explains optimization techniques, fault‑tolerance strategies, stress‑testing practices, and future upgrade plans to handle massive streaming data workloads.

Big DataPerformance OptimizationReal‑Time Computing
0 likes · 14 min read
How Alibaba Scaled Real‑Time Data Processing for Double 11: Architecture & Lessons
dbaplus Community
dbaplus Community
Dec 26, 2016 · Big Data

Why Data Lakes Are Redefining Enterprise Data Architecture

This article explains the origins, core features, logical architecture, and advantages of data lakes, contrasts them with traditional data warehouses, outlines a modern data architecture that combines lakes and warehouses, and introduces the DCE intelligent data lake platform with practical Q&A.

Big DataData Lakecloud computing
0 likes · 14 min read
Why Data Lakes Are Redefining Enterprise Data Architecture
Tencent Cloud Developer
Tencent Cloud Developer
Dec 23, 2016 · Databases

Analysis of HBase Write-Ahead Log (WAL) Mechanism and Source Code Call Chain

The article explains HBase’s write‑ahead‑log architecture, detailing how client put/delete requests travel through RPC to the RegionServer, are processed by MultiRowMutationService, written to the WAL via FSHLog.append and sync, and finally stored in MemStore, while describing durability options and the underlying source‑code call chain.

Big DataHBaseJava
0 likes · 10 min read
Analysis of HBase Write-Ahead Log (WAL) Mechanism and Source Code Call Chain
Hulu Beijing
Hulu Beijing
Dec 20, 2016 · Big Data

How Hulu Supercharges OLAP Queries with CarbonData: Real‑World Optimizations

This article describes Hulu’s real‑world OLAP query optimization, covering the fundamentals of OLAP, comparisons of row‑ and column‑based storage formats, detailed indexing mechanisms of Parquet, ORC and CarbonData, and the specific schema, shuffle, block size, speculation and GC tuning techniques that enabled CarbonData to dramatically accelerate wide‑table queries on SparkSQL.

Big DataCarbonDataColumnar Storage
0 likes · 17 min read
How Hulu Supercharges OLAP Queries with CarbonData: Real‑World Optimizations
Meituan Technology Team
Meituan Technology Team
Dec 9, 2016 · Big Data

Memory Usage Analysis of HDFS NameNode Core Data Structures

The article quantitatively breaks down HDFS NameNode memory consumption, showing that the Namespace tree and BlocksMap together dominate heap usage (≈53 GB in large clusters), provides detailed per‑object size estimates for NetworkTopology, INode and block structures, and proposes a simple formula to predict total heap requirements and tuning recommendations.

Big DataHDFSMemory Management
0 likes · 13 min read
Memory Usage Analysis of HDFS NameNode Core Data Structures
Alibaba Cloud Developer
Alibaba Cloud Developer
Dec 8, 2016 · Artificial Intelligence

How AI Powers Data‑Driven Merchant Success

In this Alibaba Tech Forum talk, senior expert Wei Hu explains how machine learning and big‑data technologies are used to empower merchants with personalized storefronts, intelligent posters, and AI‑driven headlines, boosting their efficiency and sales performance.

AIAlibabaBig Data
0 likes · 2 min read
How AI Powers Data‑Driven Merchant Success
Ctrip Technology
Ctrip Technology
Dec 2, 2016 · Big Data

Design and Architecture of Ctrip's Aegis Risk Control System

This article presents a comprehensive overview of Ctrip's Aegis risk control system, detailing its modular architecture, rule engine, data service layer, Chloro analytics platform, and future directions, while highlighting the use of streaming, big‑data processing, and machine‑learning models for real‑time fraud detection.

Big DataReal-time Processingmachine learning
0 likes · 13 min read
Design and Architecture of Ctrip's Aegis Risk Control System
Meitu Technology
Meitu Technology
Dec 1, 2016 · Big Data

Multi-dimensional Analysis Platform Based on User Portrait Data

Tencent's Glacier multi‑dimensional analysis platform combines massive user‑portrait tags with routine analytical reports, delivering fast, accurate real‑time queries across countless dimensional combinations, enabling analysts and operators to perform targeted operations and insights as product data continuously evolves.

Big DataData PlatformGlacier
0 likes · 1 min read
Multi-dimensional Analysis Platform Based on User Portrait Data
Meitu Technology
Meitu Technology
Dec 1, 2016 · Big Data

Meitu Internet Technology Salon: Big Data Architecture Evolution and Practice, and Tencent Multi‑Dimensional Analysis Platform

At Meitu’s third Internet Technology Salon in Xiamen on November 26 2016, over 150 senior engineers heard Meitu’s Lu Rongbin detail the company’s progression from simple rsync scripts to a scalable mobile data and open statistical platform, while Tencent’s Zhao Shiyuan showcased the Glacier multi‑dimensional analysis system for fast, tag‑driven queries, underscoring collaborative technical exchange in South China.

AnalyticsBig DataData Platform
0 likes · 6 min read
Meitu Internet Technology Salon: Big Data Architecture Evolution and Practice, and Tencent Multi‑Dimensional Analysis Platform
Alibaba Cloud Developer
Alibaba Cloud Developer
Nov 30, 2016 · Big Data

How Alibaba’s Double 11 Turned Big Data into a Global E‑Commerce Game‑Changer

MIT Technology Review reports that Alibaba’s 2022 Double 11 shopping festival set new e‑commerce records while showcasing the company’s advanced big‑data, AI, and cloud‑computing technologies, highlighting massive transaction volumes, high‑quality data processing, robust security measures, and the strategic push toward global digital infrastructure.

Big Datacloud computingdata security
0 likes · 11 min read
How Alibaba’s Double 11 Turned Big Data into a Global E‑Commerce Game‑Changer
Architects' Tech Alliance
Architects' Tech Alliance
Nov 28, 2016 · Big Data

User Profiling: Concepts, Stages, and Data Modeling Methods

This article explains the concept of user profiling, outlines its four-stage construction process, discusses the significance of tagging users, and details practical data modeling techniques—including static and dynamic data sources, weight calculations, and real‑world examples—aimed at improving precision marketing and recommendation systems.

Big DataTaggingbehavior analysis
0 likes · 44 min read
User Profiling: Concepts, Stages, and Data Modeling Methods
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Nov 20, 2016 · Big Data

Alibaba’s Big Data Applications in Urban Governance and Social Risk Prevention

The article describes how Alibaba leverages big data, cloud computing and AI through its “City Brain” project and security platforms to improve urban traffic management, public safety, anti‑fraud measures and e‑commerce risk control, illustrating the transformative impact of data‑driven technologies on modern social governance.

AIBig DataSmart City
0 likes · 11 min read
Alibaba’s Big Data Applications in Urban Governance and Social Risk Prevention
Architecture Digest
Architecture Digest
Nov 11, 2016 · Backend Development

High‑Availability Architecture Sessions at the China Software Developers Conference (Nov 18‑20)

The conference featured a series of high‑availability architecture talks covering performance‑driven design, RPC framework resilience, big‑data platform evolution, MySQL cluster consistency, and cloud infrastructure best practices, presented by experts from 58.com, Alibaba, Tencent, Baidu, and others.

Backend ArchitectureBig DataRPC
0 likes · 10 min read
High‑Availability Architecture Sessions at the China Software Developers Conference (Nov 18‑20)
StarRing Big Data Open Lab
StarRing Big Data Open Lab
Nov 11, 2016 · Big Data

Why SQL Still Rules Big Data—and How NoSQL & NewSQL Fit In

The article explores the evolution of data processing from Hadoop and Spark to modern SQL, NoSQL, and NewSQL solutions, comparing their architectures, performance trade‑offs, and use‑cases, while illustrating concepts with examples like MapReduce, Hive, Impala, and streaming platforms such as Storm.

Big DataHadoopNewSQL
0 likes · 14 min read
Why SQL Still Rules Big Data—and How NoSQL & NewSQL Fit In
Architects' Tech Alliance
Architects' Tech Alliance
Nov 8, 2016 · Cloud Computing

12 Notable Data Storage Startups to Watch in 2016

Amid rising data‑storage complexity, twelve innovative startups emerged in 2015‑2016, leveraging flash, disk, and cloud technologies to improve data mobility and management across hierarchical storage tiers, offering solutions ranging from cloud‑native storage networks to SAN arrays and virtualization platforms.

Big DataSANVirtualization
0 likes · 15 min read
12 Notable Data Storage Startups to Watch in 2016
MaGe Linux Operations
MaGe Linux Operations
Nov 7, 2016 · Big Data

How HDFS Achieves Low Cost, High Reliability, and Fault Tolerance

This article explains how HDFS, inspired by Google’s GFS, provides a low‑cost, highly reliable, fault‑tolerant, and high‑performance distributed file system for big‑data workloads by using replication, standby NameNodes, block storage, rack awareness, and compute‑close‑to‑data strategies.

Big DataDistributed File SystemHDFS
0 likes · 7 min read
How HDFS Achieves Low Cost, High Reliability, and Fault Tolerance
Architecture Digest
Architecture Digest
Nov 6, 2016 · Big Data

Evolution of Taobao’s Big Data Platform: From RAC to MaxCompute

The article chronicles Taobao’s 13‑year evolution of its big data platform, detailing three phases—from a single‑node Oracle setup and the Tianwang scheduler, through a Hadoop‑based “Cloud Ladder 1” architecture with real‑time analytics, to the current MaxCompute/ODPS era with cross‑region projects and advanced data services.

Big DataData PlatformData Warehouse
0 likes · 11 min read
Evolution of Taobao’s Big Data Platform: From RAC to MaxCompute
Architects' Tech Alliance
Architects' Tech Alliance
Nov 4, 2016 · Big Data

The Seven Camps of the Global Big Data Ecosystem

The article outlines how mobile Internet merges the data‑driven society with the physical world to create a new big‑data architecture and describes the seven distinct camps—Infrastructure, Analytics, Applications, Cross‑Domain Architecture, Open‑Source, Data Sources & APIs, and Incubator & Training—that together form a comprehensive end‑to‑end big‑data solution ecosystem.

APIAnalyticsApplications
0 likes · 3 min read
The Seven Camps of the Global Big Data Ecosystem
Meituan Technology Team
Meituan Technology Team
Nov 4, 2016 · Big Data

Design and Implementation of a Low-Latency App Exception Monitoring Platform Using Spark Streaming, Kafka, and Elasticsearch

The paper presents a production‑grade, low‑cost mobile‑app exception monitoring platform built on Spark Streaming, Kafka, and Elasticsearch that achieves high availability through exactly‑once processing and checkpointing, minute‑level latency by decoupling raw and symbolized logs, high throughput via reservoir sampling, and dynamic scalability without code changes.

Big DataElasticsearchException Monitoring
0 likes · 11 min read
Design and Implementation of a Low-Latency App Exception Monitoring Platform Using Spark Streaming, Kafka, and Elasticsearch
Architects' Tech Alliance
Architects' Tech Alliance
Nov 3, 2016 · Industry Insights

Scaling Billion‑Level Ads: Architecture Lessons from Sogou’s Senior Engineer

In this interview, Sogou architect Liu Jian shares how his team built a highly available, scalable commercial advertising platform, discusses the evolution of its infrastructure, offers practical advice for engineers aspiring to become architects, and reflects on emerging technologies and time‑management strategies.

Big DataDistributed SystemsSoftware Engineering
0 likes · 10 min read
Scaling Billion‑Level Ads: Architecture Lessons from Sogou’s Senior Engineer
StarRing Big Data Open Lab
StarRing Big Data Open Lab
Nov 1, 2016 · Big Data

Will SQL on Hadoop Replace Hybrid Architectures? Key Big Data Trends Unveiled

The article analyzes four major big‑data evolution trends—SQL on Hadoop overtaking hybrid architectures, SSDs becoming cache in Hadoop clusters, the rise of real‑time analytics, and the convergence of cloud computing with big data—while presenting supporting data, predictions, and architectural diagrams.

Big DataReal-time analyticsSQL on Hadoop
0 likes · 15 min read
Will SQL on Hadoop Replace Hybrid Architectures? Key Big Data Trends Unveiled
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Oct 31, 2016 · Cloud Computing

How Taobao Scaled from LAMP to Cloud: Lessons in Cloud Migration Architecture

This article examines the evolution of Taobao's technical architecture—from a LAMP stack through Oracle‑based mainframes to a cloud‑native platform—highlighting the performance, scalability, and cost challenges of traditional IT and offering best‑practice strategies for migrating enterprise systems to the cloud.

Big DataOperationsarchitecture migration
0 likes · 15 min read
How Taobao Scaled from LAMP to Cloud: Lessons in Cloud Migration Architecture
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Oct 27, 2016 · Big Data

Inside Taobao’s Massive Data Architecture: How 1.5 PB Daily Is Processed and Served

The article explains Taobao’s five‑layer data product architecture—covering data sources, compute, storage, query, and product layers—and describes how massive volumes of data are ingested, processed in batch and streaming, stored in MySQL and HBase clusters, and served efficiently through a unified middle‑layer and sophisticated caching mechanisms.

Big DataDistributed SystemsHBase
0 likes · 15 min read
Inside Taobao’s Massive Data Architecture: How 1.5 PB Daily Is Processed and Served
21CTO
21CTO
Oct 21, 2016 · Artificial Intelligence

How Toutiao Dominated Chinese News with AI‑Powered Personalization

This article examines Toutiao’s evolution from a simple news aggregator to a 600‑billion‑RMB valued AI‑driven recommendation platform, detailing its market growth, data‑driven personalization, product features, business model, talent philosophy, and future outlook.

AIBig DataRecommendation Engine
0 likes · 10 min read
How Toutiao Dominated Chinese News with AI‑Powered Personalization
Efficient Ops
Efficient Ops
Oct 20, 2016 · Operations

Transforming Business Operations with Cloud, Big Data, and Integrated IT Management

The article explains how modern business operation management integrates cloud computing, big data analytics, and proactive IT monitoring to shift from traditional infrastructure‑centric maintenance to a user‑experience‑driven, data‑powered approach that boosts performance, accelerates growth, and supports digital transformation.

Big DataDigital TransformationIT monitoring
0 likes · 8 min read
Transforming Business Operations with Cloud, Big Data, and Integrated IT Management
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Oct 17, 2016 · Artificial Intelligence

Wang Jian’s Keynote at the 2016 Hangzhou Yunqi Conference: Data Brain, AI, and Cloud Computing

In his 2016 Yunqi Conference keynote, Wang Jian highlighted how Alibaba’s cloud and AI technologies transform city traffic by linking surveillance cameras to traffic lights, discussed the evolution from Deep Blue to AlphaGo, and reflected on the broader impact of data-driven innovation on society.

AIBig DataSmart City
0 likes · 13 min read
Wang Jian’s Keynote at the 2016 Hangzhou Yunqi Conference: Data Brain, AI, and Cloud Computing
Qunar Tech Salon
Qunar Tech Salon
Oct 17, 2016 · Information Security

Design and Implementation of a Cloud‑Based Web Application Firewall at Ctrip

This article describes Ctrip's challenges with web security, evaluates hardware and commercial cloud WAF shortcomings, and presents a low‑cost, low‑risk cloud‑based WAF solution that leverages DNS redirection, closed‑loop rule management, Lua/Tengine deployment, supervised machine‑learning log analysis, and big‑data streaming for real‑time threat detection and mitigation.

Big DataWAFWeb Security
0 likes · 9 min read
Design and Implementation of a Cloud‑Based Web Application Firewall at Ctrip
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Oct 16, 2016 · Big Data

Mastering Data Sync, Real-Time Analytics, and Scalable Storage for Modern Systems

This article explains how to design and implement heterogeneous data synchronization, leverage batch and stream processing frameworks like Hadoop and Storm for large‑scale analysis, and choose appropriate storage solutions—from in‑memory databases to distributed column‑family stores—while addressing performance, reliability, and monitoring in complex distributed environments.

Big DataDistributed Systemsdata synchronization
0 likes · 26 min read
Mastering Data Sync, Real-Time Analytics, and Scalable Storage for Modern Systems
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Oct 15, 2016 · Artificial Intelligence

Computing Power as the Engine of Digitalization and Intelligent Manufacturing – Insights from Alibaba Cloud Conference

In his keynote at the Alibaba Cloud Conference, CTO Zhang Jianfeng explains how advances in computing power, AI, big data, and IoT are driving the digital transformation of retail, manufacturing, and services, enabling smarter products, personalized experiences, and a fully connected intelligent world.

Big DataDigitalizationIoT
0 likes · 19 min read
Computing Power as the Engine of Digitalization and Intelligent Manufacturing – Insights from Alibaba Cloud Conference
Alibaba Cloud Developer
Alibaba Cloud Developer
Oct 14, 2016 · Artificial Intelligence

How Alibaba’s CTO Envisions AI‑Driven Smart Manufacturing and the Future of Digitalized Worlds

In his Yunqi Conference keynote, Alibaba CTO Zhang Jianfeng explains how soaring computing power, digitalization, AI, IoT and immersive technologies will transform retail, manufacturing and services into intelligent, personalized ecosystems, illustrating the vision with examples like a smart golf club and a city‑wide data brain.

AIBig DataIoT
0 likes · 21 min read
How Alibaba’s CTO Envisions AI‑Driven Smart Manufacturing and the Future of Digitalized Worlds
StarRing Big Data Open Lab
StarRing Big Data Open Lab
Oct 8, 2016 · Big Data

Evolving Data Warehouses with Hadoop & Spark: Core Technologies

Data warehouses centralize and transform enterprise data for multidimensional analysis, and modern demands have spawned four types—traditional, real‑time, associative discovery, and data marts—each with distinct technical requirements, while Hadoop‑based solutions like Transwarp Data Hub address challenges of scale, variety, latency, and security.

Big DataHadoopReal-time analytics
0 likes · 21 min read
Evolving Data Warehouses with Hadoop & Spark: Core Technologies
Java High-Performance Architecture
Java High-Performance Architecture
Sep 27, 2016 · Big Data

Build a Hadoop Cluster with Docker: Step‑by‑Step Guide

Learn how to quickly set up a multi‑node Hadoop cluster on a single machine using Docker containers, covering image preparation, SSH configuration, fixed IP assignment with pipework, and building custom Hadoop images, enabling a lightweight, cost‑effective big‑data environment for development and testing.

Big DataCentOSCluster
0 likes · 9 min read
Build a Hadoop Cluster with Docker: Step‑by‑Step Guide
StarRing Big Data Open Lab
StarRing Big Data Open Lab
Sep 26, 2016 · Operations

Automate Cluster Health Checks with Koalas: Cutting Big Data Downtime

The article introduces Koalas, an automated distributed diagnostic tool for TDH clusters that identifies and resolves computing environment issues—such as network, platform, and system problems—through one‑click checks, detailed reporting, and both preventive and diagnostic use cases.

Big DataCluster MonitoringPerformance Optimization
0 likes · 8 min read
Automate Cluster Health Checks with Koalas: Cutting Big Data Downtime
dbaplus Community
dbaplus Community
Sep 12, 2016 · Big Data

Apache Flume Quickstart: Log Collection and Kafka Integration

This article introduces Apache Flume, explains its design goals of reliability, scalability, manageability and extensibility, outlines core concepts and architecture, provides step‑by‑step configuration using the first mode, demonstrates integration with Zookeeper, Kafka and a shell script, and shows how to launch and verify the agent.

Apache FlumeBig DataKafka Integration
0 likes · 7 min read
Apache Flume Quickstart: Log Collection and Kafka Integration
Ctrip Technology
Ctrip Technology
Sep 10, 2016 · Artificial Intelligence

Deep Learning Anti‑Scam Guide: An Informal Introduction to Neural Networks, Training, and Practical Applications

This article provides a light‑hearted yet thorough overview of deep learning, covering neural network fundamentals, layer construction, back‑propagation, ResNet shortcuts, encoder‑decoder structures, PU‑learning for unlabeled data, GPU acceleration, and practical advice on data size, frameworks, and deployment in financial scenarios.

BackpropagationBig DataGPU
0 likes · 27 min read
Deep Learning Anti‑Scam Guide: An Informal Introduction to Neural Networks, Training, and Practical Applications
Meituan Technology Team
Meituan Technology Team
Aug 26, 2016 · Big Data

Memory Architecture and Analysis of Hadoop HDFS NameNode

The article dissects Hadoop 2.4.1’s HDFS NameNode memory architecture, detailing how the Namespace, BlockManager, NetworkTopology, and LeaseManager consume the heap, exposing scaling problems when metadata reaches hundreds of millions of inodes and blocks, and recommending file merging, block‑size tuning, federation, or external KV stores to mitigate heap pressure.

Big DataHDFSMemory Management
0 likes · 17 min read
Memory Architecture and Analysis of Hadoop HDFS NameNode
Ctrip Technology
Ctrip Technology
Aug 26, 2016 · Information Security

Ctrip Information Security Salon Summary – Cloud WAF, Big Data Analysis, ELK Monitoring, and Recruitment Highlights

The Ctrip Information Security Salon held on August 20 in Shanghai featured expert talks on cloud‑based WAF, big‑data security analytics, ELK‑driven monitoring, product security practices, and concluded with a recruitment drive for security engineers, showcasing practical implementations and industry challenges.

Big DataCloud WAFCtrip
0 likes · 8 min read
Ctrip Information Security Salon Summary – Cloud WAF, Big Data Analysis, ELK Monitoring, and Recruitment Highlights
MaGe Linux Operations
MaGe Linux Operations
Aug 23, 2016 · Big Data

Step-by-Step Guide to Building a Hadoop Cluster on CentOS 6.5

This article provides a comprehensive, hands‑on tutorial for setting up a Hadoop 2.6.4 cluster on a CentOS 6.5 development server, covering SSH password‑less login, user/group creation, DNS configuration, JDK installation, environment variables, Hadoop installation, HDFS and YARN configuration, and troubleshooting native library warnings.

Big DataCentOSCluster Setup
0 likes · 12 min read
Step-by-Step Guide to Building a Hadoop Cluster on CentOS 6.5
Ctrip Technology
Ctrip Technology
Aug 19, 2016 · Big Data

Ctrip's Big Data Architecture and Personalized Recommendation System

This article describes how Ctrip transformed its traditional application architecture into a high‑concurrency, big‑data‑driven platform, detailing storage, compute, and business‑layer redesigns that enable massive data ingestion, real‑time user‑intent services, and a scalable personalized recommendation system.

Big DataCtripHadoop
0 likes · 14 min read
Ctrip's Big Data Architecture and Personalized Recommendation System
Architects' Tech Alliance
Architects' Tech Alliance
Aug 18, 2016 · Cloud Computing

Understanding the Evolution and Competition Between Traditional and Emerging Storage Technologies

The article analyzes how cloud-native, distributed, and software‑defined storage solutions are reshaping the enterprise storage market, compares them with traditional high‑reliability systems, and offers guidance on selecting, integrating, and migrating storage technologies based on business scenarios, cost, and performance considerations.

Big DataEnterpriseSDS
0 likes · 10 min read
Understanding the Evolution and Competition Between Traditional and Emerging Storage Technologies
Ctrip Technology
Ctrip Technology
Aug 12, 2016 · Big Data

Ctrip's Real-Time Data Platform: Architecture, Practices, and Lessons Learned

This article details Ctrip's journey building a unified real-time data platform—covering business motivations, architectural requirements, technology choices like Kafka and Storm, implementation of Avro schemas, monitoring, alerting, operational lessons, and future explorations such as Streaming CQL and JStorm.

AlertingBig DataKafka
0 likes · 15 min read
Ctrip's Real-Time Data Platform: Architecture, Practices, and Lessons Learned
Meituan Technology Team
Meituan Technology Team
Aug 5, 2016 · Big Data

Meituan Delivery Big Data: Full‑Chain Application Insights

The talk details Meituan‑Dianping’s end‑to‑end delivery big‑data system, highlighting mobile‑ and local‑centric usage, a two‑stage forecasting pipeline that combines autoregressive baselines with a boosting‑based multiplier model, layered log‑collect‑process‑serve architecture, sophisticated feature engineering, real‑time inference, and strategies for logistics constraints and cold‑start merchants.

Big DataMeituandelivery
0 likes · 8 min read
Meituan Delivery Big Data: Full‑Chain Application Insights
Meituan Technology Team
Meituan Technology Team
Aug 5, 2016 · Big Data

Meituan-Dianping Tech Salon: Full‑Chain Application of Food‑Delivery Big Data – User Profiling, Marketing Strategies, and Predictive Modeling

The Meituan‑Dianping tech salon detailed how food‑delivery big data drives full‑chain marketing, using RFM‑based user segmentation, rich demographic and behavior profiles, churn‑prediction and survival models, and scenario‑driven expansion tactics to acquire, retain, and grow customers across the order lifecycle.

Big DataMeituanPredictive Modeling
0 likes · 9 min read
Meituan-Dianping Tech Salon: Full‑Chain Application of Food‑Delivery Big Data – User Profiling, Marketing Strategies, and Predictive Modeling