Tagged articles
123 articles
Page 2 of 2
Ctrip Technology
Ctrip Technology
Aug 26, 2016 · Big Data

Exploring OLAP Engine with Apache Kylin: Architecture, Theory, and Practical Applications in Flight Ticket Big Data

This article presents a comprehensive overview of the Qdata session on OLAP engine exploration, detailing the limitations of traditional MySQL‑based solutions, the requirements for large‑scale analytics, the architecture and theoretical foundations of Apache Kylin, its cube construction process, storage in HBase, query rewriting, real‑world flight‑ticket data applications, and the encountered challenges with corresponding optimization practices.

Apache KylinCubeData Warehouse
0 likes · 7 min read
Exploring OLAP Engine with Apache Kylin: Architecture, Theory, and Practical Applications in Flight Ticket Big Data
Architecture Digest
Architecture Digest
Jul 5, 2016 · Big Data

Why Map‑Reduce Is Not the Solution to Your Big Data Problem – A Critical Look at Hadoop

The article reviews Hadoop’s origins from Google’s pioneering papers, explains its architecture and ecosystem, evaluates its strengths such as scalability and benchmarks, discusses current limitations like single‑point failures and complex programming, and outlines upcoming improvements including HDFS Federation and next‑generation MapReduce.

Big DataFutureHDFS
0 likes · 14 min read
Why Map‑Reduce Is Not the Solution to Your Big Data Problem – A Critical Look at Hadoop
Hulu Beijing
Hulu Beijing
May 31, 2016 · Big Data

What’s New in Hadoop 3.0? Key Features and Improvements Explained

Hadoop 3.0, built on JDK 1.8, adds erasure‑coded HDFS, multi‑NameNode support, native MapReduce task optimizations, cgroup‑based YARN memory and disk isolation, and container resizing, with an alpha slated for summer and a GA release expected in November or December.

Big DataHDFSHadoop
0 likes · 5 min read
What’s New in Hadoop 3.0? Key Features and Improvements Explained
dbaplus Community
dbaplus Community
May 25, 2016 · Databases

How Parallel Execution Supercharges SQL Server Queries—and the Pitfalls to Avoid

This article explains the theory behind SQL Server's parallel execution, illustrates its performance gains with Amdahl's Law, lists operators that block parallelism, discusses configuration settings, warns of deadlocks and thread starvation, and presents practical MapReduce‑style optimizations for real‑world workloads.

Amdahl's LawMapReduceParallel Execution
0 likes · 16 min read
How Parallel Execution Supercharges SQL Server Queries—and the Pitfalls to Avoid
Architect
Architect
May 11, 2016 · Big Data

Comprehensive Guide to Hadoop MapReduce Job Execution, Scheduling, and Optimization

This article provides an in‑depth explanation of Hadoop MapReduce architecture, covering the roles of JobClient, JobTracker, TaskTracker and HDFS, the complete job lifecycle from submission to completion, scheduling strategies, shuffle and sort mechanisms, fault tolerance, and performance tuning techniques.

Big DataHadoopJobTracker
0 likes · 20 min read
Comprehensive Guide to Hadoop MapReduce Job Execution, Scheduling, and Optimization
21CTO
21CTO
Apr 20, 2016 · Fundamentals

Why Algorithms Matter More Than Learning Every New Programming Language

The article argues that, despite the hype around ever‑changing programming languages, mastering core algorithms and computer science theory remains essential for building efficient, scalable solutions across fields—from search engines and parallel computing to scientific research—because algorithms are the enduring foundation of technology.

Data StructuresMapReducecomputer science fundamentals
0 likes · 11 min read
Why Algorithms Matter More Than Learning Every New Programming Language
21CTO
21CTO
Mar 31, 2016 · Big Data

Why Hadoop Isn’t the Silver Bullet for Big Data: Insights from Facebook

The article examines common misconceptions about Hadoop, compares it with relational databases, and shares Facebook's data‑analysis practices, highlighting when Hadoop is appropriate and the broader considerations of using open‑source big‑data frameworks.

HadoopMapReduceRelational Databases
0 likes · 8 min read
Why Hadoop Isn’t the Silver Bullet for Big Data: Insights from Facebook
ITPUB
ITPUB
Feb 20, 2016 · Big Data

Doug Cutting’s Journey: How Hadoop Shaped the Big Data Era

The article chronicles Doug Cutting’s path from his Stanford studies and early Xerox work through the creation of Lucene, Nutch, and Hadoop, highlighting how open‑source innovations and Google’s technologies propelled Hadoop to become a cornerstone of modern big‑data processing and its future outlook.

Big DataDoug CuttingHadoop
0 likes · 15 min read
Doug Cutting’s Journey: How Hadoop Shaped the Big Data Era
21CTO
21CTO
Feb 14, 2016 · Big Data

How PageRank Works: From Random Surfer Theory to MapReduce Implementation

This article explains the fundamental principles of Google's PageRank algorithm, modeling web pages as a directed graph and a random surfer, discusses matrix formulation, convergence issues like dangling nodes and traps, and demonstrates a practical MapReduce implementation with Python code for large‑scale rank computation.

Big DataMapReducePageRank
0 likes · 15 min read
How PageRank Works: From Random Surfer Theory to MapReduce Implementation
21CTO
21CTO
Jan 9, 2016 · Backend Development

What Jeff Dean Really Built: From MapReduce to Spanner

This article debunks humorous "facts" about Jeff Dean while highlighting his real contributions to Google’s infrastructure—such as MapReduce, the Google File System, BigTable, and Spanner—and explains how his work shaped modern backend development and big data processing.

BigtableDistributed SystemsJeff Dean
0 likes · 13 min read
What Jeff Dean Really Built: From MapReduce to Spanner
21CTO
21CTO
Dec 30, 2015 · Big Data

Mastering Massive Data: MapReduce, Hadoop, and Taobao’s Architecture

This article introduces the fundamental MapReduce model and Hadoop framework, explains their roles in large‑scale data processing, and then examines Taobao’s massive‑data product architecture—including its data source, compute, storage, query, and product layers, as well as the MyFOX, Prom, and Glider components and caching strategies.

Data ArchitectureDistributed SystemsHadoop
0 likes · 16 min read
Mastering Massive Data: MapReduce, Hadoop, and Taobao’s Architecture
21CTO
21CTO
Dec 9, 2015 · Big Data

Mastering Hadoop: From MapReduce Basics to Taobao’s Massive Data Architecture

This article introduces the fundamental MapReduce model and Hadoop framework, explains their components such as HDFS, MapReduce, and HBase, and then examines Taobao’s large‑scale data product architecture—including storage, computation, query, and caching layers—to illustrate practical big‑data processing techniques.

Data ArchitectureDistributed SystemsHadoop
0 likes · 17 min read
Mastering Hadoop: From MapReduce Basics to Taobao’s Massive Data Architecture
21CTO
21CTO
Nov 26, 2015 · Big Data

Understanding Big Data: 4V Traits, Google’s Distributed Computing, and Hadoop Ecosystem

This article explores the 4V characteristics of big data, real‑world data growth examples, historical analogies, Google’s GFS‑MapReduce‑BigTable model, Hadoop’s architecture and HDFS processes, HBase components, NoSQL alternatives, and practical big‑data applications at Tencent and beyond.

Data ArchitectureHadoopMapReduce
0 likes · 7 min read
Understanding Big Data: 4V Traits, Google’s Distributed Computing, and Hadoop Ecosystem
21CTO
21CTO
Sep 19, 2015 · Artificial Intelligence

Why Distributed Machine Learning Needs More Data Than Speed

The article explains how distributed machine learning evolved from parallel computing to handle massive, long‑tail data sets, discusses the importance of scalability, fault recovery, and data‑parallel algorithms, and reviews frameworks such as MPI, MapReduce, and Pregel for building large‑scale AI systems.

Big DataData ParallelismLDA
0 likes · 24 min read
Why Distributed Machine Learning Needs More Data Than Speed
MaGe Linux Operations
MaGe Linux Operations
Aug 20, 2015 · Big Data

15 Must‑Try Resources to Master Hadoop Quickly

This article explains what Hadoop is, outlines its key features, and presents a curated list of 15 high‑quality tutorials, video courses, and books to help beginners and professionals efficiently learn Hadoop and its MapReduce ecosystem.

HadoopLearning ResourcesMapReduce
0 likes · 12 min read
15 Must‑Try Resources to Master Hadoop Quickly
21CTO
21CTO
Aug 11, 2015 · Big Data

Understanding MapReduce Through a Pizza Sauce Analogy

The author recounts delivering a MapReduce talk, then uses a vivid pizza sauce preparation story to illustrate how mapping chops ingredients and reducing blends them, effectively explaining distributed data processing concepts to a non‑technical audience.

AnalogyMapReducedata-processing
0 likes · 7 min read
Understanding MapReduce Through a Pizza Sauce Analogy
Efficient Ops
Efficient Ops
Jun 25, 2015 · Big Data

Inside Baidu’s 8‑Year Evolution of Hadoop and Distributed Computing

This article chronicles Baidu’s eight‑year journey from early Hadoop adoption to advanced MPI, DAG engines, and real‑time streaming platforms, detailing architectural milestones, performance optimizations, and practical lessons for large‑scale offline and online data processing.

BaiduDAGHadoop
0 likes · 21 min read
Inside Baidu’s 8‑Year Evolution of Hadoop and Distributed Computing