Master Pandas: From Installation to Data Analysis in Python
This tutorial walks you through installing Pandas, importing data from CSV and Excel, creating DataFrames from dictionaries, describing datasets, indexing with loc/iloc, and cleaning columns using apply, all illustrated with clear code examples and visual outputs.
Pandas (derived from the first letters of Panel, DataFrame, Series) is a powerful Python library for data processing, statistical analysis, visualization, and export.
Installation and Import
You can install Pandas via the command line: pip install pandas If you use Anaconda, Pandas is already installed. Import it with the common alias:
import pandas as pdData Import
Load external CSV or Excel files using pd.read_csv or pd.read_excel. Example: car = pd.read_csv('data/car-sales.csv') The file path is relative to the script location, and the data is automatically converted into a DataFrame.
Generating a DataFrame
Create a DataFrame from a dictionary of lists:
make = ['Toyota', 'Honda', 'Toyota', 'BMW', 'Nissan', 'Toyota', 'Honda', 'Honda', 'Toyota', 'Nissan']
color = ['White', 'Red', 'Blue', 'Black', 'White', 'Green', 'Blue', 'Blue', 'White', 'White']
odometer = [150043, 87899, 32549, 11179, 213095, 99213, 45698, 54738, 60000, 31600]
doors = [4, 4, 3, 5, 4, 4, 4, 4, 4, 4]
price = ['$4,000.00', '$5,000.00', '$7,000.00', '$22,000.00', '$3,500.00', '$4,500.00', '$7,500.00', '$7,000.00', '$6,250.00', '$9,700.00']
car = pd.DataFrame({'Make': make, 'Colour': color, 'Odometer (KM)': odometer, 'Doors': doors, 'Price': price})The resulting table is shown in the Jupyter notebook:
Data Description
Use the describe method for quick statistical summaries:
car.describe()By default it describes numeric columns; to include categorical data, set the include parameter, e.g., car.describe(include=['object', 'float', 'int']). Missing values appear as NaN.
Data Indexing
Extracting Columns
Use bracket notation similar to dictionaries:
car['Price'] # single column
car[['Make', 'Colour']] # multiple columnsUsing loc and iloc
locindexes by label, while iloc indexes by integer position. Example: retrieve the value at row index 1 and column "Odometer (KM)": car.loc[1, 'Odometer (KM)'] Or using integer positions: car.iloc[1, 2] Use : to select all rows or columns, e.g., all values in the "Odometer (KM)" column:
car.loc[:, 'Odometer (KM)']
car.iloc[:, 2]Using apply
Apply a function to transform a column. To clean the Price column by removing the dollar sign and commas and converting to float:
def to_num(x):
x_new = x.replace('$', '')
x_new = x_new.replace(',', '')
return float(x_new)
car['Price'].apply(to_num)This converts the price strings into numeric values for further analysis.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Model Perspective
Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
