Getting Started with Pandas: Installation, DataFrames, and Basic Data Analysis in Python
This tutorial introduces Pandas, a powerful Python data‑analysis library, covering installation, importing, creating DataFrames from various sources, basic inspection, selection, filtering, sorting, grouping, handling missing values, and a practical stock‑price analysis example with code snippets.
Pandas is a powerful Python data‑analysis library widely used for data cleaning, processing, and analysis. It provides convenient data structures and tools that simplify handling tabular data.
Installation
Install Pandas via pip:
pip install pandasImporting Pandas
After installation, import the library using the common alias pd:
import pandas as pdCreating a DataFrame
A DataFrame is Pandas' primary data structure, similar to an Excel sheet or SQL table. It can be created from lists, dictionaries, CSV files, etc.
From a List
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)
print(df)Output:
Name Age City
0 Alice 25 New York
1 Bob 30 Los Angeles
2 Charlie 35 ChicagoFrom a CSV File
Assuming a file data.csv with the following content:
Name,Age,City
Alice,25,New York
Bob,30,Los Angeles
Charlie,35,ChicagoRead it with:
df = pd.read_csv('data.csv')
print(df)Output:
Name Age City
0 Alice 25 New York
1 Bob 30 Los Angeles
2 Charlie 35 ChicagoBasic DataFrame Inspection
df.head()– view the first few rows. df.tail() – view the last few rows. df.columns – list column names. df.dtypes – show data types of each column.
Data Selection and Filtering
Select a single column: df['Name'] Select multiple columns: df[['Name', 'Age']] Conditional filtering:
filtered_df = df[df['Age'] > 30]Sorting
Sort by a column using sort_values:
sorted_df = df.sort_values(by='Age', ascending=False)
print(sorted_df)Grouping
Group data and compute aggregates with groupby:
grouped_df = df.groupby('City').mean()
print(grouped_df)Missing‑Value Handling
Check for missing values: df.isnull() Fill missing values:
df['Age'] = df['Age'].fillna(0)Practical Example: Stock Data Analysis
Given a CSV stock_data.csv containing daily stock prices, read and analyze it:
df = pd.read_csv('stock_data.csv')
print(df)Calculate daily percentage change:
df['Change'] = df['Close'].pct_change() * 100
print(df)Plot the closing‑price trend (requires matplotlib):
import matplotlib.pyplot as plt
plt.figure(figsize=(10, 5))
plt.plot(df['Date'], df['Close'], marker='o')
plt.title('Stock Closing Price Trend')
plt.xlabel('Date')
plt.ylabel('Closing Price')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()Conclusion
The article demonstrated how to use Pandas for data analysis in Python, covering installation, DataFrame creation, basic inspection, selection, filtering, sorting, grouping, missing‑value handling, and a real‑world stock‑price analysis case.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
