Fundamentals 18 min read

Master Pandas: A Step‑by‑Step Guide to Data Analysis with Python

This comprehensive tutorial introduces Pandas—the powerful Python library for data manipulation and analysis—covers installation, data import, inspection, cleaning, indexing, selection, sorting, grouping, transformation, statistical functions, visualization, and exporting, all illustrated with clear code examples and visual outputs.

Python Crawling & Data Mining

Jul 24, 2021

Master Pandas: A Step‑by‑Step Guide to Data Analysis with Python

Pandas Overview

Pandas is a third‑party Python library designed for flexible data processing and analysis, especially for numeric and time‑series data, but also capable of handling textual data. It was created by Wes McKinney in 2008 and its name derives from the econometrics term “panel data”.

Python Introduction

Python is a powerful, easy‑to‑learn interpreted language with rich data structures, cross‑platform support, and extensive use in data science, machine learning, and AI. Beginners are advised to start with Python 3.6 or later.

Installation and Import

pip install pandas matplotlib

For slower networks, use a domestic mirror:

pip install pandas matplotlib -i https://pypi.tuna.tsinghua.edu.cn/simple

Import the library in a Jupyter Notebook:

import pandas as pd

Dataset Preparation

The tutorial uses a sample Excel file team.xlsx containing student quarterly scores. The file can be downloaded from https://www.gairuo.com/file/data/dataset/team.xlsx . The key columns are name , team , and Q1–Q4 .

Reading Data

df = pd.read_excel('https://www.gairuo.com/file/data/dataset/team.xlsx')
# or df = pd.read_excel('team.xlsx')
# For CSV files use pd.read_csv()

The DataFrame df now holds the data.

Viewing Data

df.head()      # first 5 rows
df.tail()      # last 5 rows
df.sample(5)   # random 5 rows

Data Verification

df.shape        # (rows, columns)
df.info()       # index, dtypes, memory usage
df.describe()   # statistical summary
df.dtypes       # column types
df.columns      # column names

Setting Index

df.set_index('name', inplace=True)

Data Selection

Column selection

df['Q1']          # single column
df[['team','Q1']] # multiple columns
df.loc[:, ['team','Q1']]

Row selection

df[df.index == 'Liver']   # by index value
df[0:3]                   # first three rows
df.iloc[:10, :]           # first ten rows

Label‑based selection

df.loc['Ben', 'Q1':'Q4']
df.loc['Eorge':'Alexander', 'team':'Q4']

Conditional filtering

df[df.Q1 > 90]
df[df.team == 'C']
df[(df['Q1'] > 90) & (df['team'] == 'C')]

Sorting

df.sort_values(by='Q1')
df.sort_values(by='Q1', ascending=False)
df.sort_values(['team','Q1'], ascending=[True,False])

Group‑by Aggregation

df.groupby('team').sum()
df.groupby('team').mean()
df.groupby('team').agg({'Q1':'sum','Q2':'count','Q3':'mean','Q4':'max'})

Data Transformation

df.groupby('team').sum().T

Adding Columns

df['one'] = 1
df['total'] = df['Q1'] + df['Q2'] + df['Q3'] + df['Q4']
df['total'] = df.loc[:, 'Q1':'Q4'].apply(lambda x: sum(x), axis=1)
df['total'] = df.sum(axis=1)
df['avg'] = df['total'] / 4

Statistical Functions

df.mean()
df.mean(1)
df.corr()
df.count()
df.max()
df.min()
df.median()
df.std()
df.var()
df.mode()

Visualization

# line plot of Q1
df['Q1'].plot()
# line plot for a specific student
df.loc['Ben','Q1':'Q4'].plot()
# bar and horizontal bar
df.loc['Ben','Q1':'Q4'].plot.bar()
df.loc['Ben','Q1':'Q4'].plot.barh()
# multiple lines for each team
df.groupby('team').sum().T.plot()
# pie chart of team sizes
df.groupby('team').count().Q1.plot.pie()

Exporting Data

df.to_excel('team-done.xlsx')
df.to_csv('team-done.csv')

The exported files are saved in the same directory as the notebook.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python data analysis data science Pandas Jupyter Notebook

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.