Master Pandas: A Step‑by‑Step Guide to Data Analysis with Python
This comprehensive tutorial introduces Pandas—the powerful Python library for data manipulation and analysis—covers installation, data import, inspection, cleaning, indexing, selection, sorting, grouping, transformation, statistical functions, visualization, and exporting, all illustrated with clear code examples and visual outputs.
Pandas Overview
Pandas is a third‑party Python library designed for flexible data processing and analysis, especially for numeric and time‑series data, but also capable of handling textual data. It was created by Wes McKinney in 2008 and its name derives from the econometrics term “panel data”.
Python Introduction
Python is a powerful, easy‑to‑learn interpreted language with rich data structures, cross‑platform support, and extensive use in data science, machine learning, and AI. Beginners are advised to start with Python 3.6 or later.
Installation and Import
pip install pandas matplotlibFor slower networks, use a domestic mirror:
pip install pandas matplotlib -i https://pypi.tuna.tsinghua.edu.cn/simpleImport the library in a Jupyter Notebook:
import pandas as pdDataset Preparation
The tutorial uses a sample Excel file team.xlsx containing student quarterly scores. The file can be downloaded from https://www.gairuo.com/file/data/dataset/team.xlsx . The key columns are name , team , and Q1–Q4 .
Reading Data
df = pd.read_excel('https://www.gairuo.com/file/data/dataset/team.xlsx')
# or df = pd.read_excel('team.xlsx')
# For CSV files use pd.read_csv()The DataFrame df now holds the data.
Viewing Data
df.head() # first 5 rows
df.tail() # last 5 rows
df.sample(5) # random 5 rowsData Verification
df.shape # (rows, columns)
df.info() # index, dtypes, memory usage
df.describe() # statistical summary
df.dtypes # column types
df.columns # column namesSetting Index
df.set_index('name', inplace=True)Data Selection
Column selection
df['Q1'] # single column
df[['team','Q1']] # multiple columns
df.loc[:, ['team','Q1']]Row selection
df[df.index == 'Liver'] # by index value
df[0:3] # first three rows
df.iloc[:10, :] # first ten rowsLabel‑based selection
df.loc['Ben', 'Q1':'Q4']
df.loc['Eorge':'Alexander', 'team':'Q4']Conditional filtering
df[df.Q1 > 90]
df[df.team == 'C']
df[(df['Q1'] > 90) & (df['team'] == 'C')]Sorting
df.sort_values(by='Q1')
df.sort_values(by='Q1', ascending=False)
df.sort_values(['team','Q1'], ascending=[True,False])Group‑by Aggregation
df.groupby('team').sum()
df.groupby('team').mean()
df.groupby('team').agg({'Q1':'sum','Q2':'count','Q3':'mean','Q4':'max'})Data Transformation
df.groupby('team').sum().TAdding Columns
df['one'] = 1
df['total'] = df['Q1'] + df['Q2'] + df['Q3'] + df['Q4']
df['total'] = df.loc[:, 'Q1':'Q4'].apply(lambda x: sum(x), axis=1)
df['total'] = df.sum(axis=1)
df['avg'] = df['total'] / 4Statistical Functions
df.mean()
df.mean(1)
df.corr()
df.count()
df.max()
df.min()
df.median()
df.std()
df.var()
df.mode()Visualization
# line plot of Q1
df['Q1'].plot()
# line plot for a specific student
df.loc['Ben','Q1':'Q4'].plot()
# bar and horizontal bar
df.loc['Ben','Q1':'Q4'].plot.bar()
df.loc['Ben','Q1':'Q4'].plot.barh()
# multiple lines for each team
df.groupby('team').sum().T.plot()
# pie chart of team sizes
df.groupby('team').count().Q1.plot.pie()Exporting Data
df.to_excel('team-done.xlsx')
df.to_csv('team-done.csv')The exported files are saved in the same directory as the notebook.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
