Big Data 17 min read

Master Pandas in Python: Install, Explore, Analyze, and Visualize Data Quickly

This article introduces Python and the Pandas library, walks through installation, loading Excel data, core DataFrame operations such as selection, sorting, grouping, aggregation, transformation, adding columns, statistical analysis, visualization with Matplotlib, and finally exporting results, providing a comprehensive beginner‑to‑intermediate guide.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
Master Pandas in Python: Install, Explore, Analyze, and Visualize Data Quickly

Python Overview

Python is a powerful, easy‑to‑learn interpreted language with rich data structures, cross‑platform support, and extensive use in data science, AI, and big‑data fields.

Pandas Introduction

Pandas is a third‑party Python library for flexible data manipulation and analysis, handling numeric, time‑series, and text data. It was created by Wes McKinney in 2008 and is now at version 1.2.1.

Installation

pip install pandas matplotlib
# or use a domestic mirror
pip install pandas matplotlib -i https://pypi.tuna.tsinghua.edu.cn/simple
import pandas as pd

Loading a Dataset

Read an Excel or CSV file into a DataFrame (df):

df = pd.read_excel('https://www.gairuo.com/file/data/dataset/team.xlsx')

df.head(), df.tail(), df.sample(5) display sample rows.

Figure 1: Pandas and Python relationship diagram
Figure 1: Pandas and Python relationship diagram

Basic Operations

Read from Excel, CSV, SQL, clipboard, etc.

Merge, split, clean (deduplicate, fill missing, handle outliers).

Indexing, large‑scale handling, column insertion, flexible queries.

Group‑by, aggregation, pivot‑like calculations.

Transpose, stack/unstack.

Add columns, compute totals and averages.

Statistical functions: mean, describe, corr, count, max, min, median, std, var, mode.

Data Selection

Column selection:

df['Q1']
df.Q1

Row selection by label or position:

df[df.index == 'Liver']
df[0:3]
df.iloc[:10, :]

Conditional filtering:

df[df.Q1 > 90]
df[(df['Q1'] > 90) & (df['team'] == 'C')]
Figure 2: Reading data in Jupyter Notebook
Figure 2: Reading data in Jupyter Notebook

Sorting

df.sort_values(by='Q1')
df.sort_values(by='Q1', ascending=False)
df.sort_values(['team','Q1'], ascending=[True, False])
Figure 3: DataFrame after setting name as index
Figure 3: DataFrame after setting name as index

Grouping and Aggregation

df.groupby('team').sum()
df.groupby('team').mean()
df.groupby('team').agg({'Q1':'sum','Q2':'count','Q3':'mean','Q4':'max'})
Figure 4: Grouped average per team
Figure 4: Grouped average per team
Figure 5: Different aggregation methods per column
Figure 5: Different aggregation methods per column

Transformation

df.groupby('team').sum().T
Figure 6: Transposed aggregated data
Figure 6: Transposed aggregated data

Adding Columns

df['one'] = 1
df['total'] = df['Q1'] + df['Q2'] + df['Q3'] + df['Q4']
df['avg'] = df['total'] / 4

Visualization

Quick plots using Pandas built‑in .plot() which leverages Matplotlib:

df['Q1'].plot()
df.loc['Ben','Q1':'Q4'].plot()
df.loc['Ben','Q1':'Q4'].plot.bar()
df.groupby('team').sum().T.plot()
df.groupby('team').count().Q1.plot.pie()
Figure 7: Line plot of Q1 scores
Figure 7: Line plot of Q1 scores
Figure 8: Ben's quarterly scores line plot
Figure 8: Ben's quarterly scores line plot
Figure 9: Bar chart of Ben's scores
Figure 9: Bar chart of Ben's scores
Figure 10: Horizontal bar chart
Figure 10: Horizontal bar chart
Figure 11: Multiple line plot per team
Figure 11: Multiple line plot per team
Figure 12: Pie chart of team member counts
Figure 12: Pie chart of team member counts

Export

df.to_excel('team-done.xlsx')
df.to_csv('team-done.csv')
Figure 13: Sample illustration
Figure 13: Sample illustration
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Jupyter Notebook
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.