Fundamentals 6 min read

Master Pandas: Install, Import Data, and Perform Powerful Data Analysis

This tutorial introduces the Pandas library, covering installation, data import from CSV and Excel, DataFrame creation, descriptive statistics, indexing with loc/iloc, and applying custom functions to clean and transform column values, all illustrated with code snippets and images.

Model Perspective
Model Perspective
Model Perspective
Master Pandas: Install, Import Data, and Perform Powerful Data Analysis

Pandas library (named from its core data structures Pan el, Da taFrame, S eries) is a powerful data processing and visualization library that supports importing data, statistical analysis, visualization, and export.

Installing and Importing Pandas

You can install Pandas via the command line with: pip install pandas If you use Anaconda, Pandas is already installed. It is commonly imported as pd. import pandas as pd Pandas is often used together with Matplotlib for plotting.

Data Import

External CSV and Excel files can be loaded with pd.read_csv and pd.read_excel. For example: car = pd.read_csv('data/car-sales.csv') The file path is relative to the running script, and the data is loaded into a DataFrame.

Creating a DataFrame

You can create a DataFrame from dictionaries:

make = ['Toyota','Honda','Toyota','BMW','Nissan','Toyota','Honda','Honda','Toyota','Nissan']
color = ['White','Red','Blue','Black','White','Green','Blue','Blue','White','White']
odometer = [150043,87899,32549,11179,213095,99213,45698,54738,60000,31600]
doors = [4,4,3,5,4,4,4,4,4,4]
price = ['$4,000.00','$5,000.00','$7,000.00','$22,000.00','$3,500.00','$4,500.00','$7,500.00','$7,000.00','$6,250.00','$9,700.00']
car = pd.DataFrame({'Make':make,'Colour':color,'Odometer (KM)':odometer,'Doors':doors,'Price':price})

The resulting DataFrame is displayed in Jupyter Notebook.

Data Description

Pandas provides a describe method for quick statistical summaries:

car.describe()

By default it describes numeric columns; to include categorical data use include=['object','float','int']. Missing values appear as NaN.

Data Indexing

Selecting Single or Multiple Columns

Column selection works like dictionary indexing:

car['Price']  # single column
car[['Make','Colour']]  # multiple columns

Using loc and iloc

loc

indexes by label, while iloc indexes by integer position. Example retrieving the value at row index 1 and column 'Odometer (KM)': car.loc[1,'Odometer (KM)'] Or using integer positions: car.iloc[1,2] Slice syntax with : selects all rows or columns, e.g., all values in the 'Odometer (KM)' column:

car.loc[:, 'Odometer (KM)']
car.loc[:, 2]

Using apply

The apply method applies a function to a column. To clean the Price column by removing dollar signs and commas and converting to float:

def to_num(x):
    x_new = x.replace('$','')
    x_new = x_new.replace(',','')
    return float(x_new)
car['Price'].apply(to_num)
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

dataframepandasdata importdata manipulationdata-analysis
Model Perspective
Written by

Model Perspective

Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.