Overview of Common Python Libraries for Artificial Intelligence with Code Examples
This article provides a comprehensive introduction to popular Python libraries used in artificial intelligence, such as NumPy, OpenCV, scikit-image, Pillow, SimpleCV, Mahotas, Ilastik, Scikit-learn, SciPy, NLTK, spaCy, LibROSA, Pandas, Matplotlib, Seaborn, Orange, PyBrain, Theano, Keras, Caffe, MXNet, PaddlePaddle, CNTK, and more, including code snippets and usage examples.
To give readers a basic understanding of commonly used Python libraries in artificial intelligence, this article briefly and comprehensively introduces each library.
1. NumPy
NumPy (Numerical Python) is an extension library for Python that supports large multi-dimensional arrays and matrix operations, providing many mathematical functions. Its core is written in C, making array operations much faster than pure Python code.
import numpy as np
import math
import random
import time
start = time.time()
for i in range(10):
list_1 = list(range(1,10000))
for j in range(len(list_1)):
list_1[j] = math.sin(list_1[j])
print("使用纯Python用时{}s".format(time.time()-start))
start = time.time()
for i in range(10):
list_1 = np.array(np.arange(1,10000))
list_1 = np.sin(list_1)
print("使用Numpy用时{}s".format(time.time()-start))The results show that using NumPy is significantly faster than pure Python.
使用纯Python用时0.017444372177124023s
使用Numpy用时0.001619577407836914s2. OpenCV
OpenCV is a cross‑platform computer‑vision library written in C/C++ with a Python interface, offering many common image‑processing algorithms.
import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt
img = cv.imread('h89817032p0.png')
kernel = np.ones((5,5),np.float32)/25
dst = cv.filter2D(img,-1,kernel)
blur_1 = cv.GaussianBlur(img,(5,5),0)
blur_2 = cv.bilateralFilter(img,9,75,75)
plt.figure(figsize=(10,10))
plt.subplot(221),plt.imshow(img[:,:,::-1]),plt.title('Original')
plt.subplot(222),plt.imshow(dst[:,:,::-1]),plt.title('Averaging')
plt.subplot(223),plt.imshow(blur_1[:,:,::-1]),plt.title('Gaussian')
plt.subplot(224),plt.imshow(blur_2[:,:,::-1]),plt.title('Bilateral')
plt.show()3. scikit‑image
scikit‑image, built on SciPy, processes images as NumPy arrays and provides functions such as rescale, resize, and downscale_local_mean.
from skimage import data, color, io
from skimage.transform import rescale, resize, downscale_local_mean
image = color.rgb2gray(io.imread('h89817032p0.png'))
image_rescaled = rescale(image, 0.25, anti_aliasing=False)
image_resized = resize(image, (image.shape[0]//4, image.shape[1]//4), anti_aliasing=True)
image_downscaled = downscale_local_mean(image, (4,3))
plt.figure(figsize=(20,20))
plt.subplot(221),plt.imshow(image, cmap='gray'),plt.title('Original')
plt.subplot(222),plt.imshow(image_rescaled, cmap='gray'),plt.title('Rescaled')
plt.subplot(223),plt.imshow(image_resized, cmap='gray'),plt.title('Resized')
plt.subplot(224),plt.imshow(image_downscaled, cmap='gray'),plt.title('Downscaled')
plt.show()4. Pillow (PIL)
Pillow is the actively maintained fork of the Python Imaging Library (PIL) and works with Python 3.x, providing a simple API for image creation and manipulation.
5. Pillow example – generating a captcha
from PIL import Image, ImageDraw, ImageFont, ImageFilter
import random
def rndChar():
return chr(random.randint(65,90))
def rndColor():
return (random.randint(64,255), random.randint(64,255), random.randint(64,255))
def rndColor2():
return (random.randint(32,127), random.randint(32,127), random.randint(32,127))
width = 60*6
height = 60*6
image = Image.new('RGB', (width, height), (255,255,255))
font = ImageFont.truetype('/usr/share/fonts/wps-office/simhei.ttf', 60)
draw = ImageDraw.Draw(image)
for x in range(width):
for y in range(height):
draw.point((x, y), fill=rndColor())
for t in range(6):
draw.text((60*t+10,150), rndChar(), font=font, fill=rndColor2())
image = image.filter(ImageFilter.BLUR)
image.save('code.jpg', 'jpeg')6. SimpleCV
SimpleCV is an open‑source framework for building computer‑vision applications, providing high‑level access to libraries like OpenCV.
from SimpleCV import Image, Color, Display
img = Image('http://i.imgur.com/lfAeZ4n.png')
feats = img.findKeypoints()
feats.draw(color=Color.RED)
img.show()
output = img.applyLayers()
output.save('juniperfeats.png')7. Mahotas
Mahotas is a fast computer‑vision library built on NumPy, offering over 100 image‑processing functions.
import numpy as np
import mahotas
import mahotas.demos
from mahotas.thresholding import soft_threshold
from matplotlib import pyplot as plt
f = mahotas.demos.load('lena', as_grey=True)
f = f[128:,128:]
plt.gray()
print("Fraction of zeros in original image: {}".format(np.mean(f==0)))
plt.imshow(f)
plt.show()8. Ilastik
Ilastik provides user‑friendly machine‑learning based image analysis for segmentation, classification, tracking, and counting without requiring deep ML expertise.
9. Scikit‑learn
Scikit‑learn is a free machine‑learning library for Python offering classification, regression, clustering, and many algorithms such as SVM, random forest, and K‑means.
import time
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import MiniBatchKMeans, KMeans
from sklearn.metrics.pairwise import pairwise_distances_argmin
from sklearn.datasets import make_blobs
np.random.seed(0)
batch_size = 45
centers = [[1,1],[-1,-1],[1,-1]]
n_clusters = len(centers)
X, labels_true = make_blobs(n_samples=3000, centers=centers, cluster_std=0.7)
k_means = KMeans(init='k-means++', n_clusters=3, n_init=10)
t0 = time.time()
k_means.fit(X)
t_batch = time.time() - t0
mbk = MiniBatchKMeans(init='k-means++', n_clusters=3, batch_size=batch_size, n_init=10, max_no_improvement=10, verbose=0)
t0 = time.time()
mbk.fit(X)
t_mini_batch = time.time() - t0
# Plotting code omitted for brevity10. SciPy
SciPy provides efficient numerical routines such as integration, interpolation, optimization, and special functions.
from scipy import special
import matplotlib.pyplot as plt
import numpy as np
def drumhead_height(n, k, distance, angle, t):
kth_zero = special.jn_zeros(n, k)[-1]
return np.cos(t) * np.cos(n*angle) * special.jn(n, distance*kth_zero)
theta = np.r_[0:2*np.pi:50j]
radius = np.r_[0:1:50j]
x = np.array([r * np.cos(theta) for r in radius])
y = np.array([r * np.sin(theta) for r in radius])
z = np.array([drumhead_height(1,1,r,theta,0.5) for r in radius])
fig = plt.figure()
ax = fig.add_axes(rect=(0,0.05,0.95,0.95), projection='3d')
ax.plot_surface(x, y, z, rstride=1, cstride=1, cmap='RdBu_r', vmin=-0.5, vmax=0.5)
ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_zlabel('Z')
plt.show()11. NLTK
NLTK is a library for natural language processing, providing corpora, tokenizers, taggers, and parsers.
import nltk
from nltk.corpus import treebank
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('maxent_ne_chunker')
nltk.download('words')
nltk.download('treebank')
sentence = """At eight o'clock on Thursday morning Arthur didn't feel very good."""
tokens = nltk.word_tokenize(sentence)
tagged = nltk.pos_tag(tokens)
entities = nltk.chunk.ne_chunk(tagged)12. spaCy
spaCy is a free, open‑source library for advanced NLP in Python, suitable for building large‑scale information extraction or preprocessing pipelines.
import spacy
texts = ["Net income was $9.4 million compared to the prior year of $2.7 million.",
"Revenue exceeded twelve billion dollars, with a loss of $1b."]
nlp = spacy.load("en_core_web_sm")
for doc in nlp.pipe(texts, disable=["tok2vec","tagger","parser","attribute_ruler","lemmatizer"]):
print([(ent.text, ent.label_) for ent in doc.ents])13. LibROSA
LibROSA is a Python library for music and audio analysis, offering tools for beat tracking and feature extraction.
import librosa
filename = librosa.example('nutcracker')
y, sr = librosa.load(filename)
tempo, beat_frames = librosa.beat.beat_track(y=y, sr=sr)
print('Estimated tempo: {:.2f} beats per minute'.format(tempo))
beat_times = librosa.frames_to_time(beat_frames, sr=sr)14. Pandas
Pandas is a fast, powerful, flexible, and easy‑to‑use open‑source data analysis and manipulation tool for Python.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
ts = pd.Series(np.random.randn(1000), index=pd.date_range("1/1/2000", periods=1000))
ts = ts.cumsum()
df = pd.DataFrame(np.random.randn(1000,4), index=ts.index, columns=list("ABCD"))
df = df.cumsum()
df.plot()
plt.show()15. Matplotlib
Matplotlib is Python’s plotting library that provides a MATLAB‑like API for creating publication‑quality figures.
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0.1, 2*np.pi, 100)
plt.plot(x, x)
plt.plot(x, np.square(x))
plt.plot(x, np.log(x))
plt.plot(x, np.sin(x))
plt.show()16. Seaborn
Seaborn builds on Matplotlib to provide a higher‑level interface for statistical graphics.
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme(style="ticks")
df = sns.load_dataset("penguins")
sns.pairplot(df, hue="species")
plt.show()17. Orange
Orange is an open‑source data‑mining and machine‑learning suite with a visual programming front‑end and a Python library.
$ pip install orange3
$ orange-canvas18. PyBrain
PyBrain is a modular machine‑learning library for Python, offering tools for reinforcement learning, neural networks, and more.
from pybrain.structure import FeedForwardNetwork
n = FeedForwardNetwork()
from pybrain.structure import LinearLayer, SigmoidLayer
inLayer = LinearLayer(2)
hiddenLayer = SigmoidLayer(3)
outLayer = LinearLayer(1)
n.addInputModule(inLayer)
n.addModule(hiddenLayer)
n.addOutputModule(outLayer)
from pybrain.structure import FullConnection
in_to_hidden = FullConnection(inLayer, hiddenLayer)
hidden_to_out = FullConnection(hiddenLayer, outLayer)
n.addConnection(in_to_hidden)
n.addConnection(hidden_to_out)
n.sortModules()19. MILK
MILK (Machine Learning Toolkit) provides various classifiers such as SVMs, K‑NN, random forests, and decision trees.
import numpy as np
import milk
features = np.random.rand(100,10)
labels = np.zeros(100)
features[50:] += .5
labels[50:] = 1
learner = milk.defaultclassifier()
model = learner.train(features, labels)
example = np.random.rand(10)
print(model.apply(example))
example2 = np.random.rand(10) + .5
print(model.apply(example2))20. TensorFlow
TensorFlow is an open‑source machine‑learning platform; this example builds a CNN using TensorFlow 2.x.
import tensorflow as tf
from tensorflow.keras import datasets, layers, models
(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()
train_images, test_images = train_images/255.0, test_images/255.0
model = models.Sequential()
model.add(layers.Conv2D(32, (3,3), activation='relu', input_shape=(32,32,3)))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(64, (3,3), activation='relu'))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(64, (3,3), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10))
model.compile(optimizer='adam', loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy'])
history = model.fit(train_images, train_labels, epochs=10, validation_data=(test_images, test_labels))21. PyTorch
PyTorch is a flexible deep‑learning framework that supports dynamic computation graphs.
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda, Compose
import matplotlib.pyplot as plt
device = "cuda" if torch.cuda.is_available() else "cpu"
print("Using {} device".format(device))
class NeuralNetwork(nn.Module):
def __init__(self):
super(NeuralNetwork, self).__init__()
self.flatten = nn.Flatten()
self.linear_relu_stack = nn.Sequential(
nn.Linear(28*28, 512), nn.ReLU(),
nn.Linear(512, 512), nn.ReLU(),
nn.Linear(512, 10), nn.ReLU()
)
def forward(self, x):
x = self.flatten(x)
logits = self.linear_relu_stack(x)
return logits
model = NeuralNetwork().to(device)
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)22. Theano
Theano allows defining, optimizing, and efficiently evaluating mathematical expressions involving multi‑dimensional arrays.
import theano
import theano.tensor as T
x = T.dvector('x')
y = x ** 2
J, updates = theano.scan(lambda i, y, x: T.grad(y[i], x), sequences=T.arange(y.shape[0]), non_sequences=[y, x])
f = theano.function([x], J, updates=updates)
print(f([4,4]))23. Keras
Keras is a high‑level neural‑network API written in Python, capable of running on top of TensorFlow, CNTK, or Theano.
from keras.models import Sequential
from keras.layers import Dense
model = Sequential()
model.add(Dense(units=64, activation='relu', input_dim=100))
model.add(Dense(units=10, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5, batch_size=32)24. Caffe
Caffe2 has merged into PyTorch; existing APIs still work but PyTorch is now the recommended interface.
25. MXNet
MXNet is a deep‑learning framework designed for efficiency and flexibility, supporting both symbolic and imperative programming.
import mxnet as mx
from mxnet import gluon
from mxnet.gluon import nn
from mxnet import autograd as ag
import mxnet.ndarray as F
mnist = mx.test_utils.get_mnist()
batch_size = 100
train_data = mx.io.NDArrayIter(mnist['train_data'], mnist['train_label'], batch_size, shuffle=True)
val_data = mx.io.NDArrayIter(mnist['test_data'], mnist['test_label'], batch_size)
class Net(gluon.Block):
def __init__(self, **kwargs):
super(Net, self).__init__(**kwargs)
self.conv1 = nn.Conv2D(20, kernel_size=(5,5))
self.pool1 = nn.MaxPool2D(pool_size=(2,2), strides=(2,2))
self.conv2 = nn.Conv2D(50, kernel_size=(5,5))
self.pool2 = nn.MaxPool2D(pool_size=(2,2), strides=(2,2))
self.fc1 = nn.Dense(500)
self.fc2 = nn.Dense(10)
def forward(self, x):
x = self.pool1(F.tanh(self.conv1(x)))
x = self.pool2(F.tanh(self.conv2(x)))
x = x.reshape((0, -1))
x = F.tanh(self.fc1(x))
x = F.tanh(self.fc2(x))
return x
net = Net()
ctx = [mx.gpu() if mx.test_utils.list_gpus() else mx.cpu()]
net.initialize(mx.init.Xavier(magnitude=2.24), ctx=ctx)
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.03})
metric = mx.metric.Accuracy()
softmax_cross_entropy_loss = gluon.loss.SoftmaxCrossEntropyLoss()
for epoch in range(10):
train_data.reset()
for batch in train_data:
data = gluon.utils.split_and_load(batch.data[0], ctx_list=ctx, batch_axis=0)
label = gluon.utils.split_and_load(batch.label[0], ctx_list=ctx, batch_axis=0)
outputs = []
with ag.record():
for x, y in zip(data, label):
z = net(x)
loss = softmax_cross_entropy_loss(z, y)
loss.backward()
outputs.append(z)
metric.update(label, outputs)
trainer.step(batch.data[0].shape[0])
name, acc = metric.get()
metric.reset()
print('training acc at epoch %d: %s=%f' % (epoch, name, acc))26. PaddlePaddle
PaddlePaddle is an open‑source deep‑learning platform from Baidu, offering a complete suite of tools and models.
import paddle
import numpy as np
from paddle.nn import Conv2D, MaxPool2D, Linear
import paddle.nn.functional as F
class LeNet(paddle.nn.Layer):
def __init__(self, num_classes=1):
super(LeNet, self).__init__()
self.conv1 = Conv2D(in_channels=1, out_channels=6, kernel_size=5)
self.max_pool1 = MaxPool2D(kernel_size=2, stride=2)
self.conv2 = Conv2D(in_channels=6, out_channels=16, kernel_size=5)
self.max_pool2 = MaxPool2D(kernel_size=2, stride=2)
self.conv3 = Conv2D(in_channels=16, out_channels=120, kernel_size=4)
self.fc1 = Linear(in_features=120, out_features=64)
self.fc2 = Linear(in_features=64, out_features=num_classes)
def forward(self, x):
x = self.conv1(x)
x = F.sigmoid(x)
x = self.max_pool1(x)
x = F.sigmoid(x)
x = self.conv2(x)
x = self.max_pool2(x)
x = self.conv3(x)
x = paddle.reshape(x, [x.shape[0], -1])
x = self.fc1(x)
x = F.sigmoid(x)
x = self.fc2(x)
return x27. CNTK
Microsoft Cognitive Toolkit (CNTK) is a deep‑learning framework that describes neural networks as directed graphs.
NDLNetworkBuilder=[
run=ndlLR
ndlLR=[
SDim=$dimension$
LDim=1
features=Input(SDim, 1)
labels=Input(LDim, 1)
B0=Parameter(4)
W0=Parameter(4, SDim)
B=Parameter(LDim)
W=Parameter(LDim, 4)
t0=Times(W0, features)
z0=Plus(t0, B0)
s0=Sigmoid(z0)
t=Times(W, s0)
z=Plus(t, B)
s=Sigmoid(z)
LR=Logistic(labels, s)
EP=SquareError(labels, s)
FeatureNodes=(features)
LabelNodes=(labels)
CriteriaNodes=(LR)
EvalNodes=(EP)
OutputNodes=(s,t,z,s0,W0)
]
]Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
