Big Data 4 min read

Introduction to Apache Spark and Its Core Components

Apache Spark, an open‑source unified analytics engine from UC Berkeley’s AMP Lab, is the leading platform for large‑scale batch and streaming data processing, featuring components such as Spark SQL, Streaming, GraphX, MLlib, and core modules like DAGScheduler, TaskScheduler and BlockManager.

Big Data Technology Architecture

Jul 10, 2019

Introduction to Apache Spark and Its Core Components

Apache Spark is an open‑source general‑purpose parallel framework from UC Berkeley AMP Lab, designed as a fast, unified engine for large‑scale data processing.

Spark is currently the most popular unified batch‑and‑stream big‑data processing platform. Since the release of version 1.2 in 2014, it has become an indispensable component in the big‑data field, with rapid development and an active community. Spark’s ecosystem includes Spark SQL for batch and interactive queries, Spark Streaming for stream processing, and GraphX and MLlib for graph computation and machine learning.

As of now, the latest released version of Spark is 2.4.3.

This article is based on a recent internal Spark sharing, covering a detailed introduction to Spark RDDs and explanations of core modules such as DAGScheduler, TaskScheduler, and BlockManager.

Spark overview and overall workflow

Implementation of Spark core modules

Spark application libraries

Differences and connections between Spark and Hadoop

Spark applications

Follow the WeChat public account and reply 0705 to obtain the full PPT.

Previous recommendations:

Spark Shuffle的技术演进

Apache Spark 内存管理详解(下)

Long‑press the QR code to follow.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Apache Spark Spark SQL RDD TaskScheduler BlockManager DAGScheduler

Written by

Big Data Technology Architecture

Exploring Open Source Big Data and AI Technologies

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.