Introduction to NumPy, pandas, and Matplotlib for Python Data Analysis
This article introduces Python’s core data‑analysis stack—NumPy for fast multidimensional arrays, pandas for labeled DataFrames, and Matplotlib for interactive plotting—while showing how to set up a Jupyter/VS Code environment, perform basic indexing, slicing, and visualisation, and clean log files with pandas.
This article provides a brief introduction to two widely used Python data‑processing libraries, NumPy and pandas, and shows how they can be combined with Matplotlib, Jupyter, and VS Code for interactive data analysis.
Supporting Tools
The typical environment includes Python 3/2, NumPy, pandas, Matplotlib, Jupyter, IPython, and Visual Studio Code. On macOS the setup is straightforward: install the required packages with pip. pip install pandas jupyter After installing the packages, you can use VS Code with the Python extension to edit, run, and debug scripts. Below is a sample script that runs in VS Code’s Jupyter integration.
#%%
import matplotlib.pyplot as plt
import matplotlib as mpl
import numpy as np
x = np.linspace(0, 20, 100)
plt.plot(x, np.sin(x))
plt.show()NumPy
NumPy is the foundational package for scientific computing in Python. It offers a fast, efficient multi‑dimensional array object ( ndarray), vectorized operations, file I/O for array‑based datasets, linear algebra, Fourier transforms, random number generation, and tools for integrating C/C++/Fortran code.
ndarray: Multi‑dimensional Array
An ndarray behaves like a large container for numerical data, allowing element‑wise arithmetic just as with scalars.
# demo演示# 数组和矢量运算
import numpy as np
data = np.array([[0.9, 0.3, 0.4],
[0.4, 0.6, 0.7]])
# array([[0.9, 0.3, 0.4],
# [0.4, 0.6, 0.7]])
print(data * 10) # multiply each element by 10
print(data + data) # element‑wise addition
print(np.zeros(10))
print(np.zeros((3, 6)))Basic Indexing and Slicing
Indexing and slicing are essential for data manipulation.
# 切片
arr = np.arange(10)
print(arr[5]) # 6th element
print(arr[5:8]) # slice 5‑7
arr[5:8] = 12 # modify slice
arr2d = np.random.randn(7, 4)
print(arr2d < 0) # boolean mask
arr2d[arr2d < 0] = 0 # set negative values to 0Note that a slice is a view of the original array, so modifications affect the source data.
pandas
pandas provides high‑level data structures for structured data. The most common is DataFrame, a column‑oriented 2‑D table with labeled rows and columns, combining NumPy’s performance with spreadsheet‑like flexibility.
Basic Data Construction
Series : a one‑dimensional labeled array.
# demo演示
obj = Series([4, 7, -5, 3])
print(obj)
obj2 = Series([4, 7, -5, 3], index=['a','b','c','d'])
print(obj2)
print(obj2[obj2 > 0])DataFrame : a tabular data structure (example code omitted for brevity).
matplotlib
matplotlib integrates with IPython/Jupyter to provide interactive visualisation.
#%%
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 20, 100)
plt.plot(x, np.sin(x))
plt.show()IPython
IPython is the interactive computing environment that ties together the scientific Python stack. It offers a web‑based notebook, a Qt‑based GUI console, and support for parallel and distributed computing.
SciPy
SciPy is a collection of packages for solving standard scientific‑computing problems such as integration, differential equations, and linear algebra.
Practical Example: Cleaning and Analyzing Log Data
The following script demonstrates how to read a pipe‑separated log file, extract columns, filter rows, and compute value counts using pandas.
import pandas as pd
col_name = ['ip','time','action','action-rsp','guid','qq','xx','QUA','unknown','fl-list','unknown','unknown','unknown','unknown','unknown']
# Read the log file
df = pd.read_table('cache_data_15043.txt', sep='|', names=col_name)
# Extract columns
guids = df['guid']
qq = df['qq']
# Filter rows where action == 'get'
get_actions = df[df['action'] == 'get']
# Count response codes
print(get_actions['action-rsp'].value_counts())The example shows how to quickly turn raw log text into a structured DataFrame for further analysis.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
