Fundamentals 9 min read

Introduction to NumPy, pandas, and Matplotlib for Python Data Analysis

This article introduces Python’s core data‑analysis stack—NumPy for fast multidimensional arrays, pandas for labeled DataFrames, and Matplotlib for interactive plotting—while showing how to set up a Jupyter/VS Code environment, perform basic indexing, slicing, and visualisation, and clean log files with pandas.

Tencent Cloud Developer
Tencent Cloud Developer
Tencent Cloud Developer
Introduction to NumPy, pandas, and Matplotlib for Python Data Analysis

This article provides a brief introduction to two widely used Python data‑processing libraries, NumPy and pandas, and shows how they can be combined with Matplotlib, Jupyter, and VS Code for interactive data analysis.

Supporting Tools

The typical environment includes Python 3/2, NumPy, pandas, Matplotlib, Jupyter, IPython, and Visual Studio Code. On macOS the setup is straightforward: install the required packages with pip. pip install pandas jupyter After installing the packages, you can use VS Code with the Python extension to edit, run, and debug scripts. Below is a sample script that runs in VS Code’s Jupyter integration.

#%%
import matplotlib.pyplot as plt
import matplotlib as mpl
import numpy as np

x = np.linspace(0, 20, 100)
plt.plot(x, np.sin(x))
plt.show()

NumPy

NumPy is the foundational package for scientific computing in Python. It offers a fast, efficient multi‑dimensional array object ( ndarray), vectorized operations, file I/O for array‑based datasets, linear algebra, Fourier transforms, random number generation, and tools for integrating C/C++/Fortran code.

ndarray: Multi‑dimensional Array

An ndarray behaves like a large container for numerical data, allowing element‑wise arithmetic just as with scalars.

# demo演示# 数组和矢量运算
import numpy as np
data = np.array([[0.9, 0.3, 0.4],
                 [0.4, 0.6, 0.7]])
# array([[0.9, 0.3, 0.4],
#        [0.4, 0.6, 0.7]])
print(data * 10)   # multiply each element by 10
print(data + data) # element‑wise addition
print(np.zeros(10))
print(np.zeros((3, 6)))

Basic Indexing and Slicing

Indexing and slicing are essential for data manipulation.

# 切片
arr = np.arange(10)
print(arr[5])          # 6th element
print(arr[5:8])        # slice 5‑7
arr[5:8] = 12          # modify slice
arr2d = np.random.randn(7, 4)
print(arr2d < 0)       # boolean mask
arr2d[arr2d < 0] = 0   # set negative values to 0

Note that a slice is a view of the original array, so modifications affect the source data.

pandas

pandas provides high‑level data structures for structured data. The most common is DataFrame, a column‑oriented 2‑D table with labeled rows and columns, combining NumPy’s performance with spreadsheet‑like flexibility.

Basic Data Construction

Series : a one‑dimensional labeled array.

# demo演示
obj = Series([4, 7, -5, 3])
print(obj)
obj2 = Series([4, 7, -5, 3], index=['a','b','c','d'])
print(obj2)
print(obj2[obj2 > 0])

DataFrame : a tabular data structure (example code omitted for brevity).

matplotlib

matplotlib integrates with IPython/Jupyter to provide interactive visualisation.

#%%
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 20, 100)
plt.plot(x, np.sin(x))
plt.show()

IPython

IPython is the interactive computing environment that ties together the scientific Python stack. It offers a web‑based notebook, a Qt‑based GUI console, and support for parallel and distributed computing.

SciPy

SciPy is a collection of packages for solving standard scientific‑computing problems such as integration, differential equations, and linear algebra.

Practical Example: Cleaning and Analyzing Log Data

The following script demonstrates how to read a pipe‑separated log file, extract columns, filter rows, and compute value counts using pandas.

import pandas as pd
col_name = ['ip','time','action','action-rsp','guid','qq','xx','QUA','unknown','fl-list','unknown','unknown','unknown','unknown','unknown']
# Read the log file
df = pd.read_table('cache_data_15043.txt', sep='|', names=col_name)
# Extract columns
guids = df['guid']
qq = df['qq']
# Filter rows where action == 'get'
get_actions = df[df['action'] == 'get']
# Count response codes
print(get_actions['action-rsp'].value_counts())

The example shows how to quickly turn raw log text into a structured DataFrame for further analysis.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Pythondata analysisMatplotlibpandasJupyterNumPy
Tencent Cloud Developer
Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.