Tag

PySpark

1 views collected around this technical thread.

Python Programming Learning Circle
Python Programming Learning Circle
May 22, 2025 · Big Data

Introduction to PySpark: Features, Core Components, Sample Code, and Use Cases

This article introduces PySpark as the Python API for Apache Spark, explains Spark's core concepts and advantages, details PySpark's main components and a simple code example, compares it with Pandas, and outlines typical big‑data scenarios and further learning directions.

Apache SparkBig DataDataFrames
0 likes · 5 min read
Introduction to PySpark: Features, Core Components, Sample Code, and Use Cases
Python Programming Learning Circle
Python Programming Learning Circle
May 13, 2025 · Fundamentals

Top 10 Essential Python Libraries for Data Analysis with Code Examples

This article introduces ten highly practical Python libraries for data analysis—from Pandas and NumPy for data manipulation to Matplotlib, Seaborn, Plotly, Bokeh for visualization, and Scikit‑learn, Prophet, Dask, and PySpark for machine learning and big‑data processing—each illustrated with concise code snippets.

DaskMatplotlibNumPy
0 likes · 6 min read
Top 10 Essential Python Libraries for Data Analysis with Code Examples
Bitu Technology
Bitu Technology
Aug 28, 2020 · Artificial Intelligence

KPI Forecasting and Anomaly Detection at Tubi Using Prophet

This article describes how Tubi’s data science team built a robust KPI forecasting system with Facebook’s Prophet, covering visualization dashboards, anomaly detection, feature engineering, PySpark deployment, and evaluation using Brier scores to improve business decision‑making.

Anomaly DetectionBrier scoreKPI
0 likes · 13 min read
KPI Forecasting and Anomaly Detection at Tubi Using Prophet
Python Programming Learning Circle
Python Programming Learning Circle
Apr 28, 2020 · Big Data

Multiple Ways to Create New Columns in PySpark DataFrames

This tutorial explains several techniques for adding new columns to PySpark DataFrames—including native Spark functions, user‑defined functions, RDD transformations, Pandas UDFs, and SQL queries—while demonstrating data loading, schema handling, and code examples for each method.

Big DataColumn CreationPySpark
0 likes · 9 min read
Multiple Ways to Create New Columns in PySpark DataFrames
Python Programming Learning Circle
Python Programming Learning Circle
Apr 16, 2020 · Big Data

Getting Started with PySpark: Creating SparkContext, Parallelizing Data, and Basic DataFrame Operations

This tutorial demonstrates how to initialize a SparkContext in PySpark, perform simple parallel computations such as temperature conversion and reduction, create a SparkSession to read CSV data, and apply common DataFrame operations like selecting columns, adding new columns, filtering, grouping, and aggregating.

Big DataParallel ComputingPySpark
0 likes · 5 min read
Getting Started with PySpark: Creating SparkContext, Parallelizing Data, and Basic DataFrame Operations
DataFunTalk
DataFunTalk
Dec 24, 2019 · Big Data

Deep Dive into PySpark Implementation: Multi‑Process Architecture, Java Integration, RDD/SQL Interfaces, Executor Communication, and Pandas UDF

This article explains PySpark's multi‑process architecture, how the Python driver uses Py4J to call Java/Scala APIs, the implementation of RDD and DataFrame interfaces, executor‑side process communication and serialization with Arrow, and the design of Pandas UDFs, while also discussing current limitations and future directions.

ARROWBig DataPySpark
0 likes · 13 min read
Deep Dive into PySpark Implementation: Multi‑Process Architecture, Java Integration, RDD/SQL Interfaces, Executor Communication, and Pandas UDF