Tagged articles
12 articles
Page 1 of 1
Ray's Galactic Tech
Ray's Galactic Tech
Nov 18, 2025 · Big Data

From Zero to Mastery: A Complete Roadmap to Learn Apache Spark

This guide outlines a step‑by‑step learning path for Apache Spark, covering core concepts, environment setup, hands‑on WordCount code, API mastery, ecosystem extensions like Structured Streaming and MLlib, deployment options, performance tuning, and practical project advice.

Apache SparkPySparkStreaming
0 likes · 7 min read
From Zero to Mastery: A Complete Roadmap to Learn Apache Spark
StarRocks
StarRocks
Sep 5, 2024 · Big Data

Accelerate Lakehouse Queries: A Hands‑On Guide to StarRocks + Apache Iceberg

This tutorial walks you through the fundamentals of Apache Iceberg, its architecture and key features, explains why it’s advantageous for lakehouse workloads, and provides a step‑by‑step Docker‑Compose setup to integrate Iceberg with StarRocks for fast, ACID‑compliant analytics on real‑world taxi data.

Apache IcebergDockerLakehouse
0 likes · 15 min read
Accelerate Lakehouse Queries: A Hands‑On Guide to StarRocks + Apache Iceberg
DataFunSummit
DataFunSummit
Nov 9, 2023 · Big Data

Spark 3.4 New Features Overview: Community Updates, SQL Enhancements, PySpark, Streaming, and AI Ecosystem

This article presents a comprehensive overview of Spark 3.4, covering community growth statistics, major SQL improvements such as default column values and timestamp handling, new PySpark and streaming capabilities, and the emerging AI ecosystem that integrates natural‑language interfaces and Spark AI services.

DatabricksPySparkStreaming
0 likes · 14 min read
Spark 3.4 New Features Overview: Community Updates, SQL Enhancements, PySpark, Streaming, and AI Ecosystem
Bitu Technology
Bitu Technology
Aug 28, 2020 · Artificial Intelligence

KPI Forecasting and Anomaly Detection at Tubi Using Prophet

This article describes how Tubi’s data science team built a robust KPI forecasting system with Facebook’s Prophet, covering visualization dashboards, anomaly detection, feature engineering, PySpark deployment, and evaluation using Brier scores to improve business decision‑making.

Brier scoreKPIProphet
0 likes · 13 min read
KPI Forecasting and Anomaly Detection at Tubi Using Prophet
Python Programming Learning Circle
Python Programming Learning Circle
Apr 16, 2020 · Big Data

Getting Started with PySpark: Creating SparkContext, Parallelizing Data, and Basic DataFrame Operations

This tutorial demonstrates how to initialize a SparkContext in PySpark, perform simple parallel computations such as temperature conversion and reduction, create a SparkSession to read CSV data, and apply common DataFrame operations like selecting columns, adding new columns, filtering, grouping, and aggregating.

Big DataPySparkSpark
0 likes · 5 min read
Getting Started with PySpark: Creating SparkContext, Parallelizing Data, and Basic DataFrame Operations
DataFunTalk
DataFunTalk
Dec 24, 2019 · Big Data

Deep Dive into PySpark Implementation: Multi‑Process Architecture, Java Integration, RDD/SQL Interfaces, Executor Communication, and Pandas UDF

This article explains PySpark's multi‑process architecture, how the Python driver uses Py4J to call Java/Scala APIs, the implementation of RDD and DataFrame interfaces, executor‑side process communication and serialization with Arrow, and the design of Pandas UDFs, while also discussing current limitations and future directions.

ArrowBig DataPySpark
0 likes · 13 min read
Deep Dive into PySpark Implementation: Multi‑Process Architecture, Java Integration, RDD/SQL Interfaces, Executor Communication, and Pandas UDF