Essential Python Libraries for Data Processing, Visualization, and Machine Learning
This article introduces ten essential Python libraries—including SciPy, Matplotlib, Plotly, Scikit‑learn, TensorFlow, spaCy, BeautifulSoup, OpenPyXL, Feather/Parquet, and SQLAlchemy—detailing their primary uses for scientific computing, visualization, machine learning, deep learning, NLP, web scraping, Excel handling, efficient data storage, and ORM, with practical code examples.
1. SciPy
Purpose: scientific computing. SciPy, built on NumPy, provides algorithms for optimization, linear algebra, integration, interpolation, and more.
from scipy import optimize
# 最小化一个简单的函数
def f(x):
return x**2 + 10 * np.sin(x)
result = optimize.minimize(f, x0=0)
print(result.x) # 输出: [-1.30644995]2. Matplotlib and Seaborn
Purpose: data visualization. Matplotlib is a widely used plotting library; Seaborn builds on it to provide a higher‑level statistical interface.
import seaborn as sns
import matplotlib.pyplot as plt
tips = sns.load_dataset("tips")
sns.scatterplot(x="total_bill", y="tip", data=tips)
plt.show()3. Plotly
Purpose: interactive data visualization, especially for web applications.
import plotly.express as px
df = px.data.iris()
fig = px.scatter(df, x="sepal_width", y="sepal_length", color="species")
fig.show()4. Scikit-learn
Purpose: machine learning. Provides a broad range of supervised and unsupervised algorithms, plus tools for preprocessing, model selection, and evaluation.
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2)
clf = KNeighborsClassifier()
clf.fit(X_train, y_train)
print(clf.score(X_test, y_test)) # 输出分类准确率5. TensorFlow and PyTorch
Purpose: deep learning. Both frameworks offer flexible APIs for building and training neural networks, with GPU acceleration.
import tensorflow as tf
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10)
])
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5)6. NLTK and spaCy
Purpose: natural language processing. NLTK offers classic tools; spaCy focuses on speed and production‑ready pipelines.
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("Apple is looking at buying U.K. startup for $1 billion")
for ent in doc.ents:
print(ent.text, ent.label_)
# 输出: Apple ORG, U.K. GPE, $1 billion MONEY7. Beautiful Soup and Scrapy
Purpose: web data extraction. Beautiful Soup parses HTML/XML; Scrapy is a full‑featured crawling framework.
from bs4 import BeautifulSoup
import requests
response = requests.get('https://example.com')
soup = BeautifulSoup(response.text, 'html.parser')
for title in soup.find_all('title'):
print(title.string)8. OpenPyXL and XlsxWriter
Purpose: reading and writing Excel files. OpenPyXL works with existing .xlsx files; XlsxWriter creates new files with complex formatting.
import openpyxl
wb = openpyxl.load_workbook('example.xlsx')
sheet = wb.active
for row in sheet.iter_rows(values_only=True):
print(row)9. Feather and Parquet
Purpose: efficient columnar storage formats for large datasets, compatible with Pandas and many languages.
import pandas as pd
import pyarrow.parquet as pq
df = pd.DataFrame({'one': [1, 2, 3], 'two': ['a', 'b', 'c']})
df.to_parquet('example.parquet')
table = pq.read_table('example.parquet')
print(table.to_pandas())10. SQLAlchemy
Purpose: database ORM. Allows object‑oriented interaction with relational databases and supports multiple back‑ends.
from sqlalchemy import create_engine, Column, Integer, String, ForeignKey
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
Base = declarative_base()
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key=True)
name = Column(String)
engine = create_engine('sqlite:///:memory:')
Base.metadata.create_all(engine)
Session = sessionmaker(bind=engine)
session = Session()
new_user = User(name='Alice')
session.add(new_user)
session.commit()
users = session.query(User).all()
for user in users:
print(user.name)These libraries collectively cover the spectrum from basic data handling to advanced analysis and visualization. Selecting the appropriate tools based on project requirements can greatly improve efficiency, code quality, and scalability.
Test Development Learning Exchange
Test Development Learning Exchange
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.