Big Data 17 min read

Master Pandas in Python: Install, Explore, Analyze, and Visualize Data Quickly

This article introduces Python and the Pandas library, walks through installation, loading Excel data, core DataFrame operations such as selection, sorting, grouping, aggregation, transformation, adding columns, statistical analysis, visualization with Matplotlib, and finally exporting results, providing a comprehensive beginner‑to‑intermediate guide.

Python Crawling & Data Mining

Mar 27, 2022

Master Pandas in Python: Install, Explore, Analyze, and Visualize Data Quickly

Python Overview

Python is a powerful, easy‑to‑learn interpreted language with rich data structures, cross‑platform support, and extensive use in data science, AI, and big‑data fields.

Pandas Introduction

Pandas is a third‑party Python library for flexible data manipulation and analysis, handling numeric, time‑series, and text data. It was created by Wes McKinney in 2008 and is now at version 1.2.1.

Installation

pip install pandas matplotlib
# or use a domestic mirror
pip install pandas matplotlib -i https://pypi.tuna.tsinghua.edu.cn/simple

import pandas as pd

Loading a Dataset

Read an Excel or CSV file into a DataFrame (df):

df = pd.read_excel('https://www.gairuo.com/file/data/dataset/team.xlsx')

df.head(), df.tail(), df.sample(5) display sample rows.

Figure 1: Pandas and Python relationship diagram

Basic Operations

Read from Excel, CSV, SQL, clipboard, etc.

Merge, split, clean (deduplicate, fill missing, handle outliers).

Indexing, large‑scale handling, column insertion, flexible queries.

Group‑by, aggregation, pivot‑like calculations.

Transpose, stack/unstack.

Add columns, compute totals and averages.

Statistical functions: mean, describe, corr, count, max, min, median, std, var, mode.

Data Selection

Column selection:

df['Q1']
df.Q1

Row selection by label or position:

df[df.index == 'Liver']
df[0:3]
df.iloc[:10, :]

Conditional filtering:

df[df.Q1 > 90]
df[(df['Q1'] > 90) & (df['team'] == 'C')]

Figure 2: Reading data in Jupyter Notebook

Sorting

df.sort_values(by='Q1')
df.sort_values(by='Q1', ascending=False)
df.sort_values(['team','Q1'], ascending=[True, False])

Figure 3: DataFrame after setting name as index

Grouping and Aggregation

df.groupby('team').sum()
df.groupby('team').mean()
df.groupby('team').agg({'Q1':'sum','Q2':'count','Q3':'mean','Q4':'max'})

Figure 5: Different aggregation methods per column

Transformation

df.groupby('team').sum().T

Adding Columns

df['one'] = 1
df['total'] = df['Q1'] + df['Q2'] + df['Q3'] + df['Q4']
df['avg'] = df['total'] / 4

Visualization

Quick plots using Pandas built‑in .plot() which leverages Matplotlib:

df['Q1'].plot()
df.loc['Ben','Q1':'Q4'].plot()
df.loc['Ben','Q1':'Q4'].plot.bar()
df.groupby('team').sum().T.plot()
df.groupby('team').count().Q1.plot.pie()

Figure 8: Ben's quarterly scores line plot

Figure 12: Pie chart of team member counts

Export

df.to_excel('team-done.xlsx')
df.to_csv('team-done.csv')

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Jupyter Notebook

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.