Comprehensive Pandas Tutorial: Installation, Data Types, Indexing, Selection, Grouping, and Visualization
This tutorial provides a step‑by‑step guide to using Pandas in Python, covering installation, the core Series and DataFrame structures, data creation, indexing with loc and iloc, assignment, arithmetic operations, observation, statistical functions, grouping, pivot tables, time‑series handling, plotting, and data I/O, all illustrated with complete code examples.
1. Introduction Pandas was originally developed for financial data analysis and offers strong support for time‑series data; its name comes from “panel data” and Python data analysis.
2. Installation and Import
Install via pip:
pip install pandasImport the library:
import pandas as pd3. Core Data Types
Series – a one‑dimensional labeled array. Example:
import numpy as np
s = pd.Series([1, 2, 5, np.nan, 6, 8])
print(s)Output:
0 1.0
1 2.0
2 5.0
3 NaN
4 6.0
5 8.0
dtype: float64DataFrame – a two‑dimensional table with labeled axes. Example of creating a DataFrame with a date index:
dates = pd.date_range('20130101', periods=6)
df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD'))
print(df)Another example using a dictionary:
df2 = pd.DataFrame({
'A': 1.,
'B': pd.Timestamp('20130102'),
'C': pd.Series(1, index=list(range(4)), dtype='float32'),
'D': np.array([3]*4, dtype='int32'),
'E': pd.Categorical(["test","train","test","train"]),
'F': 'foo'
})
print(df2)4. Data Input Load CSV files:
df = pd.read_csv("Average_Daily_Traffic_Counts.csv", header=0)
print(df.head())Data sources can include government datasets or Kaggle.
5. Selection / Slicing
# Column selection
df['name'] # returns a Series
df[['name']] # returns a DataFrame
# Row slicing
df[0:] # all rows from index 0
df[:2] # rows before index 2
df[1:3] # rows 1 and 2
# Label based indexing
df.loc[0, 'name']
df.loc[0:2, ['name','age']]
# Position based indexing
df.iloc[0,0]
df.iloc[1:3, [1,2]]6. Assignment
# Add a new column based on a Series
s1 = pd.Series([1,2,3,4,5,6], index=pd.date_range('20130102', periods=6))
df['F'] = s1
# Reindex and fill missing values
df1 = df.reindex(index=dates[0:4], columns=list(df.columns)+['E'])
df1.loc[dates[0]:dates[1], 'E'] = 1
# Drop rows with any NaN
df1.dropna(how='any')
# Fill NaN with a constant
df1.fillna(value=5)7. Arithmetic Operations Adding, subtracting, multiplying, or dividing two DataFrames aligns on both row and column labels.
8. Observation
df.head()
df.tail(3)
df.index
df.columns
df.values
df.describe()9. Statistics Functions such as count , describe , min , max , mean , std , var , skew , kurt , cumsum , diff , and pct_change are available.
10. Grouping Group data by one or more columns and apply aggregation functions:
df.groupby('A').sum()
df.groupby(['A','B']).sum()Custom aggregation functions can also be used.
11. Pivot Tables
pd.pivot_table(df, values='D', index=['A','B'], columns=['C'])12. Time Series Pandas supports resampling, timezone localization, and conversion between periods and timestamps:
rng = pd.date_range('1/1/2012', periods=100, freq='S')
ts = pd.Series(np.random.randint(0,500,len(rng)), index=rng)
ts.resample('5Min').sum()13. Plotting
ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))
ts = ts.cumsum()
ts.plot()14. Data I/O
# Read/write CSV
pd.read_csv('foo.csv')
df.to_csv('foo.csv')
# Read/write Excel
pd.read_excel('foo.xlsx', sheet_name='Sheet1')
df.to_excel('foo.xlsx', sheet_name='Sheet1')The tutorial concludes with a promotional note offering a free Python public course and a QR code for additional learning resources.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.