Master Pandas in Python: Install, Explore, Analyze, and Visualize Data Quickly
This article introduces Python and the Pandas library, walks through installation, loading Excel data, core DataFrame operations such as selection, sorting, grouping, aggregation, transformation, adding columns, statistical analysis, visualization with Matplotlib, and finally exporting results, providing a comprehensive beginner‑to‑intermediate guide.
Python Overview
Python is a powerful, easy‑to‑learn interpreted language with rich data structures, cross‑platform support, and extensive use in data science, AI, and big‑data fields.
Pandas Introduction
Pandas is a third‑party Python library for flexible data manipulation and analysis, handling numeric, time‑series, and text data. It was created by Wes McKinney in 2008 and is now at version 1.2.1.
Installation
pip install pandas matplotlib
# or use a domestic mirror
pip install pandas matplotlib -i https://pypi.tuna.tsinghua.edu.cn/simple import pandas as pdLoading a Dataset
Read an Excel or CSV file into a DataFrame (df):
df = pd.read_excel('https://www.gairuo.com/file/data/dataset/team.xlsx')df.head(), df.tail(), df.sample(5) display sample rows.
Basic Operations
Read from Excel, CSV, SQL, clipboard, etc.
Merge, split, clean (deduplicate, fill missing, handle outliers).
Indexing, large‑scale handling, column insertion, flexible queries.
Group‑by, aggregation, pivot‑like calculations.
Transpose, stack/unstack.
Add columns, compute totals and averages.
Statistical functions: mean, describe, corr, count, max, min, median, std, var, mode.
Data Selection
Column selection:
df['Q1']
df.Q1Row selection by label or position:
df[df.index == 'Liver']
df[0:3]
df.iloc[:10, :]Conditional filtering:
df[df.Q1 > 90]
df[(df['Q1'] > 90) & (df['team'] == 'C')]Sorting
df.sort_values(by='Q1')
df.sort_values(by='Q1', ascending=False)
df.sort_values(['team','Q1'], ascending=[True, False])Grouping and Aggregation
df.groupby('team').sum()
df.groupby('team').mean()
df.groupby('team').agg({'Q1':'sum','Q2':'count','Q3':'mean','Q4':'max'})Transformation
df.groupby('team').sum().TAdding Columns
df['one'] = 1
df['total'] = df['Q1'] + df['Q2'] + df['Q3'] + df['Q4']
df['avg'] = df['total'] / 4Visualization
Quick plots using Pandas built‑in .plot() which leverages Matplotlib:
df['Q1'].plot()
df.loc['Ben','Q1':'Q4'].plot()
df.loc['Ben','Q1':'Q4'].plot.bar()
df.groupby('team').sum().T.plot()
df.groupby('team').count().Q1.plot.pie()Export
df.to_excel('team-done.xlsx')
df.to_csv('team-done.csv')Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
