Fundamentals 6 min read

Master Pandas: From Installation to Data Analysis in Python

This tutorial walks you through installing Pandas, importing data from CSV and Excel, creating DataFrames from dictionaries, describing datasets, indexing with loc/iloc, and cleaning columns using apply, all illustrated with clear code examples and visual outputs.

Model Perspective

Jun 3, 2022

Master Pandas: From Installation to Data Analysis in Python

Pandas (derived from the first letters of Panel, DataFrame, Series) is a powerful Python library for data processing, statistical analysis, visualization, and export.

Installation and Import

You can install Pandas via the command line: pip install pandas If you use Anaconda, Pandas is already installed. Import it with the common alias:

import pandas as pd

Data Import

Load external CSV or Excel files using pd.read_csv or pd.read_excel. Example: car = pd.read_csv('data/car-sales.csv') The file path is relative to the script location, and the data is automatically converted into a DataFrame.

Generating a DataFrame

Create a DataFrame from a dictionary of lists:

make = ['Toyota', 'Honda', 'Toyota', 'BMW', 'Nissan', 'Toyota', 'Honda', 'Honda', 'Toyota', 'Nissan']
color = ['White', 'Red', 'Blue', 'Black', 'White', 'Green', 'Blue', 'Blue', 'White', 'White']
odometer = [150043, 87899, 32549, 11179, 213095, 99213, 45698, 54738, 60000, 31600]
doors = [4, 4, 3, 5, 4, 4, 4, 4, 4, 4]
price = ['$4,000.00', '$5,000.00', '$7,000.00', '$22,000.00', '$3,500.00', '$4,500.00', '$7,500.00', '$7,000.00', '$6,250.00', '$9,700.00']
car = pd.DataFrame({'Make': make, 'Colour': color, 'Odometer (KM)': odometer, 'Doors': doors, 'Price': price})

The resulting table is shown in the Jupyter notebook:

Data Description

Use the describe method for quick statistical summaries:

car.describe()

By default it describes numeric columns; to include categorical data, set the include parameter, e.g., car.describe(include=['object', 'float', 'int']). Missing values appear as NaN.

Data Indexing

Extracting Columns

Use bracket notation similar to dictionaries:

car['Price']  # single column
car[['Make', 'Colour']]  # multiple columns

Using loc and iloc

loc

indexes by label, while iloc indexes by integer position. Example: retrieve the value at row index 1 and column "Odometer (KM)": car.loc[1, 'Odometer (KM)'] Or using integer positions: car.iloc[1, 2] Use : to select all rows or columns, e.g., all values in the "Odometer (KM)" column:

car.loc[:, 'Odometer (KM)']
car.iloc[:, 2]

Using apply

Apply a function to transform a column. To clean the Price column by removing the dollar sign and commas and converting to float:

def to_num(x):
    x_new = x.replace('$', '')
    x_new = x_new.replace(',', '')
    return float(x_new)

car['Price'].apply(to_num)

This converts the price strings into numeric values for further analysis.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

data cleaning data import data-analysis

Written by

Model Perspective

Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.