Master Pandas: Install, Import Data, and Perform Powerful Data Analysis
This tutorial introduces the Pandas library, covering installation, data import from CSV and Excel, DataFrame creation, descriptive statistics, indexing with loc/iloc, and applying custom functions to clean and transform column values, all illustrated with code snippets and images.
Pandas library (named from its core data structures Pan el, Da taFrame, S eries) is a powerful data processing and visualization library that supports importing data, statistical analysis, visualization, and export.
Installing and Importing Pandas
You can install Pandas via the command line with: pip install pandas If you use Anaconda, Pandas is already installed. It is commonly imported as pd. import pandas as pd Pandas is often used together with Matplotlib for plotting.
Data Import
External CSV and Excel files can be loaded with pd.read_csv and pd.read_excel. For example: car = pd.read_csv('data/car-sales.csv') The file path is relative to the running script, and the data is loaded into a DataFrame.
Creating a DataFrame
You can create a DataFrame from dictionaries:
make = ['Toyota','Honda','Toyota','BMW','Nissan','Toyota','Honda','Honda','Toyota','Nissan']
color = ['White','Red','Blue','Black','White','Green','Blue','Blue','White','White']
odometer = [150043,87899,32549,11179,213095,99213,45698,54738,60000,31600]
doors = [4,4,3,5,4,4,4,4,4,4]
price = ['$4,000.00','$5,000.00','$7,000.00','$22,000.00','$3,500.00','$4,500.00','$7,500.00','$7,000.00','$6,250.00','$9,700.00']
car = pd.DataFrame({'Make':make,'Colour':color,'Odometer (KM)':odometer,'Doors':doors,'Price':price})The resulting DataFrame is displayed in Jupyter Notebook.
Data Description
Pandas provides a describe method for quick statistical summaries:
car.describe()By default it describes numeric columns; to include categorical data use include=['object','float','int']. Missing values appear as NaN.
Data Indexing
Selecting Single or Multiple Columns
Column selection works like dictionary indexing:
car['Price'] # single column
car[['Make','Colour']] # multiple columnsUsing loc and iloc
locindexes by label, while iloc indexes by integer position. Example retrieving the value at row index 1 and column 'Odometer (KM)': car.loc[1,'Odometer (KM)'] Or using integer positions: car.iloc[1,2] Slice syntax with : selects all rows or columns, e.g., all values in the 'Odometer (KM)' column:
car.loc[:, 'Odometer (KM)']
car.loc[:, 2]Using apply
The apply method applies a function to a column. To clean the Price column by removing dollar signs and commas and converting to float:
def to_num(x):
x_new = x.replace('$','')
x_new = x_new.replace(',','')
return float(x_new) car['Price'].apply(to_num)Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Model Perspective
Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
