Fundamentals 6 min read

Master Pandas: From Installation to Data Analysis in Python

This tutorial walks you through installing Pandas, importing data from CSV and Excel, creating DataFrames from dictionaries, describing datasets, indexing with loc/iloc, and cleaning columns using apply, all illustrated with clear code examples and visual outputs.

Model Perspective
Model Perspective
Model Perspective
Master Pandas: From Installation to Data Analysis in Python

Pandas (derived from the first letters of Panel, DataFrame, Series) is a powerful Python library for data processing, statistical analysis, visualization, and export.

Installation and Import

You can install Pandas via the command line: pip install pandas If you use Anaconda, Pandas is already installed. Import it with the common alias:

import pandas as pd

Data Import

Load external CSV or Excel files using pd.read_csv or pd.read_excel. Example: car = pd.read_csv('data/car-sales.csv') The file path is relative to the script location, and the data is automatically converted into a DataFrame.

Generating a DataFrame

Create a DataFrame from a dictionary of lists:

make = ['Toyota', 'Honda', 'Toyota', 'BMW', 'Nissan', 'Toyota', 'Honda', 'Honda', 'Toyota', 'Nissan']
color = ['White', 'Red', 'Blue', 'Black', 'White', 'Green', 'Blue', 'Blue', 'White', 'White']
odometer = [150043, 87899, 32549, 11179, 213095, 99213, 45698, 54738, 60000, 31600]
doors = [4, 4, 3, 5, 4, 4, 4, 4, 4, 4]
price = ['$4,000.00', '$5,000.00', '$7,000.00', '$22,000.00', '$3,500.00', '$4,500.00', '$7,500.00', '$7,000.00', '$6,250.00', '$9,700.00']
car = pd.DataFrame({'Make': make, 'Colour': color, 'Odometer (KM)': odometer, 'Doors': doors, 'Price': price})

The resulting table is shown in the Jupyter notebook:

Jupyter notebook output
Jupyter notebook output

Data Description

Use the describe method for quick statistical summaries:

car.describe()
describe output
describe output

By default it describes numeric columns; to include categorical data, set the include parameter, e.g., car.describe(include=['object', 'float', 'int']). Missing values appear as NaN.

Data Indexing

Extracting Columns

Use bracket notation similar to dictionaries:

car['Price']  # single column
car[['Make', 'Colour']]  # multiple columns

Using loc and iloc

loc

indexes by label, while iloc indexes by integer position. Example: retrieve the value at row index 1 and column "Odometer (KM)": car.loc[1, 'Odometer (KM)'] Or using integer positions: car.iloc[1, 2] Use : to select all rows or columns, e.g., all values in the "Odometer (KM)" column:

car.loc[:, 'Odometer (KM)']
car.iloc[:, 2]

Using apply

Apply a function to transform a column. To clean the Price column by removing the dollar sign and commas and converting to float:

def to_num(x):
    x_new = x.replace('$', '')
    x_new = x_new.replace(',', '')
    return float(x_new)

car['Price'].apply(to_num)

This converts the price strings into numeric values for further analysis.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

data cleaningdata importdata-analysis
Model Perspective
Written by

Model Perspective

Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.