Artificial Intelligence 22 min read

Numerical Computing, Data Analysis, Machine Learning, and Data Visualization with Python Libraries

This article presents practical examples and code snippets for using Python libraries such as NumPy, Pandas, SciPy, Statsmodels, Dask, Vaex, Modin, CuPy, Scikit‑learn, TensorFlow, PyTorch, XGBoost, LightGBM, and various visualization tools to perform efficient numerical computation, data processing, machine‑learning modeling, and interactive visual analytics.

Test Development Learning Exchange

Jan 9, 2025

Numerical Computing, Data Analysis, Machine Learning, and Data Visualization with Python Libraries

Numerical Computing and Data Analysis

Examples demonstrate how to use NumPy for array operations, Pandas for data cleaning and transformation, SciPy for statistical analysis and optimization, Statsmodels for regression modeling, Dask and Vaex for large‑scale parallel data processing, Modin for distributed Pandas‑like operations, and CuPy for GPU‑accelerated calculations.

import numpy as np
# 创建一个随机的多维数组
data = np.random.randn(1000, 10)
column_means = np.mean(data, axis=0)
print("Column Means:", column_means)
row_stds = np.std(data, axis=1)
print("Row Standard Deviations:", row_stds)
matrix_a = np.random.randn(10, 5)
matrix_b = np.random.randn(5, 10)
result = np.dot(matrix_a, matrix_b)
print("Matrix Multiplication Result:
", result)
bias = np.array([1, 2, 3, 4, 5])
data_with_bias = data + bias
print("Data with Bias Added:
", data_with_bias)

import pandas as pd
import numpy as np
# 创建一个包含缺失值的DataFrame
data = {'A': [1, 2, np.nan, 4], 'B': [5, np.nan, np.nan, 8], 'C': [9, 10, 11, 12]}
df = pd.DataFrame(data)
print("Original DataFrame:
", df)
df_cleaned = df.dropna()
print("DataFrame after dropping rows with NaN:
", df_cleaned)
df_filled = df.fillna(df.mean())
print("DataFrame after filling NaN with mean:
", df_filled)
df['D'] = df['A'] * df['C']
print("DataFrame with new column D:
", df)
grouped = df.groupby('A').mean()
print("Grouped by A and calculated mean:
", grouped)
other_data = pd.DataFrame({'A': [1, 2, 3], 'E': [10, 20, 30]})
merged_df = pd.merge(df, other_data, on='A', how='left')
print("Merged DataFrame:
", merged_df)

Machine Learning

Code snippets illustrate building pipelines with Scikit‑learn, constructing neural networks with TensorFlow and Keras, training models with PyTorch, and applying gradient‑boosting algorithms using XGBoost, LightGBM, and CatBoost. Additional examples cover natural‑language processing with Hugging Face Transformers, Bayesian modeling with PyMC3, and anomaly detection with PyOD.

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report

data = load_iris()
X, y = data.data, data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
pipeline = Pipeline([('scaler', StandardScaler()), ('classifier', RandomForestClassifier(random_state=42))])
param_grid = {'classifier__n_estimators': [50, 100, 200], 'classifier__max_depth': [None, 10, 20, 30]}
grid_search = GridSearchCV(pipeline, param_grid, cv=5, scoring='accuracy')
grid_search.fit(X_train, y_train)
best_model = grid_search.best_estimator_
y_pred = best_model.predict(X_test)
print("Classification Report:
", classification_report(y_test, y_pred))

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.datasets import mnist

(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train, X_test = X_train / 255.0, X_test / 255.0
model = Sequential([Flatten(input_shape=(28, 28)), Dense(128, activation='relu'), Dense(10, activation='softmax')])
model.compile(optimizer=Adam(learning_rate=0.001), loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=10, validation_split=0.2)
test_loss, test_acc = model.evaluate(X_test, y_test)
print(f"Test Accuracy: {test_acc:.4f}")

Data Visualization

Examples show creating static and interactive visualizations using Matplotlib, Seaborn, Plotly, Bokeh, Altair, Dash, Folium, Basemap, Cartopy, and Kepler.gl, covering line plots, heatmaps, 3D maps, and web‑based dashboards.

import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 10, 100)
y1 = np.sin(x)
y2 = np.cos(x)
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))
ax1.plot(x, y1, label='sin(x)', color='blue')
ax1.set_title('Sine Function')
ax1.set_xlabel('x')
ax1.set_ylabel('y')
ax1.legend()
ax1.grid(True)
ax2.plot(x, y2, label='cos(x)', color='red')
ax2.set_title('Cosine Function')
ax2.set_xlabel('x')
ax2.set_ylabel('y')
ax2.legend()
ax2.grid(True)
fig.suptitle('Trigonometric Functions', fontsize=16)
plt.show()

import plotly.express as px
import pandas as pd
df = px.data.iris()
fig = px.scatter(df, x="sepal_width", y="sepal_length", color="species", size='petal_length', hover_data=['petal_width'], title='Iris Dataset - Sepal Dimensions')
fig.add_trace(px.scatter(df, x="sepal_width", y="sepal_length", trendline="ols").data[1])
fig.update_layout(xaxis_title="Sepal Width", yaxis_title="Sepal Length", legend_title="Species")
fig.show()

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Machine Learning Python data analysis Data Visualization NumPy

Written by

Test Development Learning Exchange

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.