Tag

DataFrames

0 views collected around this technical thread.

Python Programming Learning Circle
Python Programming Learning Circle
May 22, 2025 · Big Data

Introduction to PySpark: Features, Core Components, Sample Code, and Use Cases

This article introduces PySpark as the Python API for Apache Spark, explains Spark's core concepts and advantages, details PySpark's main components and a simple code example, compares it with Pandas, and outlines typical big‑data scenarios and further learning directions.

Apache SparkBig DataDataFrames
0 likes · 5 min read
Introduction to PySpark: Features, Core Components, Sample Code, and Use Cases
Test Development Learning Exchange
Test Development Learning Exchange
Nov 20, 2024 · Fundamentals

Data Reshaping with Pandas: melt and pivot Methods

This article teaches how to use Pandas for data reshaping, covering the melt method for converting wide-format data to long-format and the pivot method for the reverse transformation, with practical code examples.

CSVData TransformationDataFrames
0 likes · 7 min read
Data Reshaping with Pandas: melt and pivot Methods
Python Programming Learning Circle
Python Programming Learning Circle
Nov 15, 2024 · Fundamentals

Understanding pandas loc/iloc vs at/iat: Performance Comparison and Best Practices

This article explains the differences between pandas loc/iloc and at/iat, demonstrates why using loc/iloc inside loops can be extremely slow, and shows how replacing them with at/iat can speed up DataFrame updates by up to 60 times.

DataFramesatloc
0 likes · 6 min read
Understanding pandas loc/iloc vs at/iat: Performance Comparison and Best Practices
Python Programming Learning Circle
Python Programming Learning Circle
Aug 9, 2024 · Big Data

Introduction to cuDF: GPU‑Accelerated DataFrames and Dask Integration

This article introduces cuDF, a Python GPU DataFrame library with a pandas‑like API, compares it to pandas, explains when to use cuDF versus Dask‑cuDF for single‑GPU or multi‑GPU workloads, and provides practical code examples for common data operations.

Big DataDaskDataFrames
0 likes · 7 min read
Introduction to cuDF: GPU‑Accelerated DataFrames and Dask Integration
Python Programming Learning Circle
Python Programming Learning Circle
Mar 4, 2022 · Big Data

Introduction to NumPy and Pandas: Fundamentals, Operations, and Data Handling in Python

This article provides a comprehensive overview of NumPy and pandas, covering ndarray basics, multi‑dimensional array creation, core array attributes, broadcasting, random number generation, reshaping, as well as pandas Series and DataFrame structures, data import/export, grouping, merging, and advanced data manipulation techniques for scientific and data‑analysis tasks.

Array OperationsDataFramesNumPy
0 likes · 17 min read
Introduction to NumPy and Pandas: Fundamentals, Operations, and Data Handling in Python
Big Data Technology Architecture
Big Data Technology Architecture
Aug 5, 2020 · Big Data

Understanding Join Execution in Spark SQL

This article explains how Spark SQL processes joins—including inner, outer, semi, and anti joins—by describing the overall query planning flow, the three physical join strategies (sort‑merge, broadcast, and hash), and the specific implementation details for each join type.

Big DataDataFramesSQL Optimization
0 likes · 10 min read
Understanding Join Execution in Spark SQL
Python Programming Learning Circle
Python Programming Learning Circle
Sep 13, 2019 · Fundamentals

Replace Excel with Python: A Step‑by‑Step Pandas Tutorial

This tutorial walks you through using Python and Pandas to import Excel files, explore DataFrames, apply filters, perform statistical calculations, create pivot tables, and emulate Excel functions like VLOOKUP, providing code examples and visual guides for each step.

DataFramesExcelPython
0 likes · 9 min read
Replace Excel with Python: A Step‑by‑Step Pandas Tutorial
High Availability Architecture
High Availability Architecture
May 19, 2016 · Big Data

Comprehensive Overview of Apache Spark: Architecture, RDD Principles, Execution Modes, and Spark 2.0 Features

This article provides an in‑depth technical overview of Apache Spark, covering its core concepts such as RDDs, transformation and action operations, execution models, Spark 2.0 enhancements like unified DataFrames/Datasets, whole‑stage code generation, Structured Streaming, and practical performance‑tuning guidance.

Big DataDataFramesRDD
0 likes · 20 min read
Comprehensive Overview of Apache Spark: Architecture, RDD Principles, Execution Modes, and Spark 2.0 Features
High Availability Architecture
High Availability Architecture
Jan 6, 2016 · Big Data

Spark Latest Features, Tungsten Project, and Hulu’s Production Practices

This article reviews Spark's evolution from version 1.2 to 1.6, explains the DataFrame and Tungsten projects, shares Hulu’s real‑world Spark deployments, and discusses performance‑related challenges such as stack overflow, streaming receiver latency, and class‑loader deadlocks.

Big DataDataFramesDataset API
0 likes · 17 min read
Spark Latest Features, Tungsten Project, and Hulu’s Production Practices