Fundamentals 6 min read

Master Pandas: Install, Import Data, Create DataFrames, and Analyze with Python

This tutorial walks through installing Pandas, importing CSV and Excel files, building DataFrames from dictionaries, describing data, indexing with loc/iloc, and applying custom functions to transform columns, providing clear code examples and visual outputs.

Model Perspective
Model Perspective
Model Perspective
Master Pandas: Install, Import Data, Create DataFrames, and Analyze with Python

Pandas Installation and Import

Install Pandas via pip or use the pre‑installed version in Anaconda, typically imported as pd :

<code>pip install pandas</code>
<code>import pandas as pd</code>

Data Import

Read external CSV or Excel files with pd.read_csv and pd.read_excel . Example:

<code>car = pd.read_csv('data/car-sales.csv')</code>

The file path is relative to the running script and the data is loaded as a DataFrame .

Creating a DataFrame

Construct a DataFrame from Python dictionaries:

<code>make = ['Toyota','Honda','Toyota','BMW','Nissan','Toyota','Honda','Honda','Toyota','Nissan']
color = ['White','Red','Blue','Black','White','Green','Blue','Blue','White','White']
odometer = [150043,87899,32549,11179,213095,99213,45698,54738,60000,31600]
doors = [4,4,3,5,4,4,4,4,4,4]
price = ['$4,000.00','$5,000.00','$7,000.00','$22,000.00','$3,500.00','$4,500.00','$7,500.00','$7,000.00','$6,250.00','$9,700.00']
car = pd.DataFrame({'Make':make,'Colour':color,'Odometer (KM)':odometer,'Doors':doors,'Price':price})
</code>

The resulting table is displayed in Jupyter Notebook.

DataFrame example
DataFrame example

Data Description

Use describe to obtain summary statistics:

<code>car.describe()</code>

The output shows common statistics for numeric columns. To include categorical data, set the include parameter, e.g. car.describe(include=['object','float','int']) . Missing values appear as NaN .

Describe output
Describe output
NaN illustration
NaN illustration

Data Indexing

Selecting Single or Multiple Columns

Use bracket notation similar to dictionary indexing:

<code>car['Price']                     # single column
car[['Make','Colour']]            # multiple columns
</code>

Using loc and iloc

loc indexes by label, while iloc indexes by integer position. Example retrieving the value at row label 1 and column 'Odometer (KM)' (or position 2):

<code>car.loc[1,'Odometer (KM)']</code>
<code>car.loc[1,2]</code>

Use : to select all rows or columns, e.g. the entire 'Odometer (KM)' column:

<code>car.loc[:, 'Odometer (KM)']</code>
<code>car.loc[:, 2]</code>

Using apply

Apply a custom function to transform a column. The following function removes dollar signs and commas from the Price column and converts it to float:

<code>def to_num(x):
    x_new = x.replace('$','')
    x_new = x_new.replace(',','')
    return float(x_new)
</code>

Apply it to the Price column:

<code>car['Price'].apply(to_num)</code>
data analysisdataframepandasapplydata-importloc
Model Perspective
Written by

Model Perspective

Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.