Tagged articles

Spark

623 articles · Page 7 of 7
21CTO
21CTO
Apr 12, 2016 · Artificial Intelligence

Designing System and Personalized Recommendation Engines with Mahout and Spark

This article explains the architecture of both system-wide and personalized recommendation modules, compares three recommendation strategies, details the use of Apache Mahout for collaborative filtering with Java code examples, and discusses cold‑start solutions within a Spark‑Hadoop stack.

MahoutSparkcold-start
0 likes · 15 min read
Designing System and Personalized Recommendation Engines with Mahout and Spark
Architecture Digest
Architecture Digest
Apr 9, 2016 · Big Data

Practical Experience of Using Spark at Meituan: Platformization, ETL Templates, Feature Platform, Data Mining, and Real‑World Applications

This article describes how Meituan migrated from Hive‑SQL and MapReduce to Spark on YARN, built an interactive Zeppelin‑based development platform, created reusable ETL templates, constructed a Spark‑driven feature and data‑mining platform, and applied Spark to interactive user‑behavior analysis and large‑scale SEM services, highlighting performance gains and operational benefits.

Big DataData PlatformDistributed Computing
0 likes · 19 min read
Practical Experience of Using Spark at Meituan: Platformization, ETL Templates, Feature Platform, Data Mining, and Real‑World Applications
Architecture Digest
Architecture Digest
Mar 28, 2016 · Big Data

Overview of the Hadoop Ecosystem and Modern Big Data Technologies

This article provides a comprehensive overview of Hadoop and its surrounding ecosystem, detailing core components, storage principles, key algorithms, and a wide range of modern big‑data technologies such as Spark, Flink, Kafka, NoSQL databases, and cloud‑based processing platforms.

Big DataHadoopNoSQL
0 likes · 11 min read
Overview of the Hadoop Ecosystem and Modern Big Data Technologies
Architect
Architect
Mar 6, 2016 · Big Data

Clustering Geolocated User Events with DBSCAN and Spark

This article explains how to apply the DBSCAN clustering algorithm to geolocated user event data and leverage Apache Spark’s distributed processing with PairRDDs to efficiently identify frequent user regions, detect outliers, and build location‑based services such as personalized recommendations and security alerts.

Big DataClusteringDBSCAN
0 likes · 8 min read
Clustering Geolocated User Events with DBSCAN and Spark
Architect
Architect
Feb 29, 2016 · Big Data

Design Principles of Real-Time Distributed Streaming Systems: A Comparison of Spark and Storm

This article examines the design considerations of real-time distributed streaming systems, outlines their background and characteristics, compares the architectures of Spark Streaming and Storm, discusses primitives, message passing, high availability, storage models, and integration with production environments, providing practical insights for architects.

High AvailabilityReal-time ProcessingSpark
0 likes · 20 min read
Design Principles of Real-Time Distributed Streaming Systems: A Comparison of Spark and Storm
ITPUB
ITPUB
Jan 20, 2016 · Big Data

How Meizu Built an Agile Big Data Platform for Millions of Users

The Meizu Tech Open Day showcased the company's rapid evolution to a data‑driven mobile internet firm, detailing its DW1.0 and DW2.0 data‑warehouse architectures, recommendation pipelines, Spark adoption, and ELK‑based log analytics, while sharing practical lessons and future challenges.

Big DataData ArchitectureData Warehouse
0 likes · 11 min read
How Meizu Built an Agile Big Data Platform for Millions of Users
Architect
Architect
Dec 31, 2015 · Big Data

Using Spark for Machine Learning, New Word Discovery, and Intelligent Q&A

The article explains how to leverage Apache Spark for machine‑learning tasks, large‑scale new‑word discovery, and simple intelligent question‑answering by using Spark‑Shell, Scala code, and word2vec‑based similarity, while sharing practical tips and performance considerations.

Big DataIntelligent QANew Word Discovery
0 likes · 15 min read
Using Spark for Machine Learning, New Word Discovery, and Intelligent Q&A
Architect
Architect
Dec 2, 2015 · Big Data

Designing an Agile Data Warehouse Architecture for Internet Companies

The article outlines a practical, end‑to‑end data platform architecture for internet businesses, covering data collection, storage and analysis, sharing, real‑time processing, task scheduling, and the importance of simplicity and agility in building an agile data warehouse.

Big DataData ArchitectureData Warehouse
0 likes · 10 min read
Designing an Agile Data Warehouse Architecture for Internet Companies
dbaplus Community
dbaplus Community
Nov 27, 2015 · Big Data

Why Spark Is the Next Big Thing in Big Data: Core Concepts Explained

This article provides a comprehensive overview of Apache Spark, covering its origins, core concepts such as RDDs, transformations, actions, dependencies, execution modes, and key components like Spark SQL, Streaming, MLlib, and GraphX, while also offering practical code examples and visual illustrations.

DataFramesGraphXMLlib
0 likes · 18 min read
Why Spark Is the Next Big Thing in Big Data: Core Concepts Explained
21CTO
21CTO
Nov 19, 2015 · Big Data

Beyond Hadoop: Modern Big Data Platforms and Technologies Explained

This article surveys the evolution of Hadoop and its ecosystem, explains core storage and processing concepts, and introduces contemporary big‑data technologies such as Spark, Flink, Kafka, Lambda architecture, NoSQL databases, and cloud‑native solutions, highlighting their roles and trade‑offs.

Big DataFlinkHadoop
0 likes · 17 min read
Beyond Hadoop: Modern Big Data Platforms and Technologies Explained

TalkingData’s Journey to Building a Mobile Big Data Platform with Spark and YARN

This article recounts how TalkingData progressively introduced Spark into its Hadoop‑YARN based mobile big‑data platform, detailing early architectures, migration challenges, performance gains, the fully Spark‑centric redesign with Kafka and Spark Streaming, encountered pitfalls, and future plans for further optimization.

Data PlatformHadoopSpark
0 likes · 16 min read
TalkingData’s Journey to Building a Mobile Big Data Platform with Spark and YARN
Architect
Architect
Oct 17, 2015 · Big Data

Designing an Agile Data Warehouse and Data Platform for Internet Companies

The article outlines the purposes, architecture, data ingestion, storage, analysis, sharing, application, real‑time processing, scheduling, monitoring, and best‑practice recommendations for building a fast, flexible, and reliable big‑data platform in the fast‑changing internet industry.

Big DataData WarehouseHadoop
0 likes · 12 min read
Designing an Agile Data Warehouse and Data Platform for Internet Companies
Efficient Ops
Efficient Ops
Oct 14, 2015 · Big Data

Spark vs Hadoop, Flink, HBase/Cassandra, Kafka & Tachyon: Expert Q&A

During a lively “Sit and Discuss” session, experts compared Spark and Hadoop, evaluated Flink against Spark, contrasted HBase with Cassandra, explained why Kafka (and sometimes Flink) is preferred for distributed messaging, and shared insights on Tachyon’s role in modern big‑data ecosystems.

CassandraFlinkHBase
0 likes · 10 min read
Spark vs Hadoop, Flink, HBase/Cassandra, Kafka & Tachyon: Expert Q&A
Qunar Tech Salon
Qunar Tech Salon
Aug 18, 2015 · Big Data

Overview of Spark Big Data Analytics Framework Components

Spark’s big‑data analytics ecosystem comprises core components such as the in‑memory RDD data structure, Streaming for real‑time processing, GraphX for graph analytics, MLlib for machine‑learning, Spark SQL for querying, the Tachyon file system, and SparkR, each enabling scalable, distributed computation.

Big DataGraphXMLlib
0 likes · 5 min read
Overview of Spark Big Data Analytics Framework Components
Baidu Tech Salon
Baidu Tech Salon
Jan 13, 2015 · Big Data

Inside Spark 1.2: New APIs, In‑Memory Columnar Storage, and Baidu’s High‑Performance Shuffle

This article reviews Spark 1.2’s major enhancements—including the External Data Source API, column pruning, predicate pushdown, and in‑memory columnar storage—while also detailing Baidu’s large‑scale Spark deployments, its custom high‑performance Shuffle service, and the integration of Spark with the Tachyon memory file system.

BaiduBig DataExternal Data Source API
0 likes · 16 min read
Inside Spark 1.2: New APIs, In‑Memory Columnar Storage, and Baidu’s High‑Performance Shuffle