Big Data 17 min read

Comprehensive Pandas Tutorial: Installation, Data Types, Indexing, Selection, Grouping, and Visualization

This tutorial provides a step‑by‑step guide to using Pandas in Python, covering installation, the core Series and DataFrame structures, data creation, indexing with loc and iloc, assignment, arithmetic operations, observation, statistical functions, grouping, pivot tables, time‑series handling, plotting, and data I/O, all illustrated with complete code examples.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Comprehensive Pandas Tutorial: Installation, Data Types, Indexing, Selection, Grouping, and Visualization

1. Introduction Pandas was originally developed for financial data analysis and offers strong support for time‑series data; its name comes from “panel data” and Python data analysis.

2. Installation and Import

Install via pip:

pip install pandas

Import the library:

import pandas as pd

3. Core Data Types

Series – a one‑dimensional labeled array. Example:

import numpy as np
s = pd.Series([1, 2, 5, np.nan, 6, 8])
print(s)

Output:

0    1.0
1    2.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64

DataFrame – a two‑dimensional table with labeled axes. Example of creating a DataFrame with a date index:

dates = pd.date_range('20130101', periods=6)
df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD'))
print(df)

Another example using a dictionary:

df2 = pd.DataFrame({
    'A': 1.,
    'B': pd.Timestamp('20130102'),
    'C': pd.Series(1, index=list(range(4)), dtype='float32'),
    'D': np.array([3]*4, dtype='int32'),
    'E': pd.Categorical(["test","train","test","train"]),
    'F': 'foo'
})
print(df2)

4. Data Input Load CSV files:

df = pd.read_csv("Average_Daily_Traffic_Counts.csv", header=0)
print(df.head())

Data sources can include government datasets or Kaggle.

5. Selection / Slicing

# Column selection
df['name']               # returns a Series
df[['name']]            # returns a DataFrame
# Row slicing
df[0:]                  # all rows from index 0
df[:2]                  # rows before index 2
df[1:3]                 # rows 1 and 2
# Label based indexing
df.loc[0, 'name']
df.loc[0:2, ['name','age']]
# Position based indexing
df.iloc[0,0]
df.iloc[1:3, [1,2]]

6. Assignment

# Add a new column based on a Series
s1 = pd.Series([1,2,3,4,5,6], index=pd.date_range('20130102', periods=6))
df['F'] = s1
# Reindex and fill missing values
df1 = df.reindex(index=dates[0:4], columns=list(df.columns)+['E'])
df1.loc[dates[0]:dates[1], 'E'] = 1
# Drop rows with any NaN
df1.dropna(how='any')
# Fill NaN with a constant
df1.fillna(value=5)

7. Arithmetic Operations Adding, subtracting, multiplying, or dividing two DataFrames aligns on both row and column labels.

8. Observation

df.head()
df.tail(3)
df.index
df.columns
df.values
df.describe()

9. Statistics Functions such as count , describe , min , max , mean , std , var , skew , kurt , cumsum , diff , and pct_change are available.

10. Grouping Group data by one or more columns and apply aggregation functions:

df.groupby('A').sum()
df.groupby(['A','B']).sum()

Custom aggregation functions can also be used.

11. Pivot Tables

pd.pivot_table(df, values='D', index=['A','B'], columns=['C'])

12. Time Series Pandas supports resampling, timezone localization, and conversion between periods and timestamps:

rng = pd.date_range('1/1/2012', periods=100, freq='S')
ts = pd.Series(np.random.randint(0,500,len(rng)), index=rng)
ts.resample('5Min').sum()

13. Plotting

ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))
ts = ts.cumsum()
ts.plot()

14. Data I/O

# Read/write CSV
pd.read_csv('foo.csv')
df.to_csv('foo.csv')
# Read/write Excel
pd.read_excel('foo.xlsx', sheet_name='Sheet1')
df.to_excel('foo.xlsx', sheet_name='Sheet1')

The tutorial concludes with a promotional note offering a free Python public course and a QR code for additional learning resources.

data analysisdataframepandasNumPydata-manipulationseries
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.