Building an Image Classification Model with Transformers and TensorFlow: Theory, Code, and Practice
This article explains how to leverage computer‑vision techniques and deep‑learning frameworks such as Transformers and TensorFlow to build a complete image‑classification pipeline, covering the underlying RGB and CNN principles, model architecture, data preparation, training, and inference with runnable Python code.
Although ChatGPT popularized AI, the real revenue comes from computer‑vision applications; AI can now generate clothing images, game monsters, and book illustrations. To illustrate this, the article first shows a low‑cost image‑classification demo using the transformers library.
Code example to load a pretrained ResNet model and classify a local image:
from transformers import AutoImageProcessor, ResNetForImageClassification
import torch
from PIL import Image
processor = AutoImageProcessor.from_pretrained("model")
model = ResNetForImageClassification.from_pretrained("model")
image = Image.open("pics/dog.jpeg")
inputs = processor(image, return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits
predicted_label = logits.argmax(-1).item()
print(model.config.id2label[predicted_label])The project structure is shown, with a model folder containing the pretrained weights and a pics folder for test images.
After the demo, the article dives into the fundamentals of image representation (RGB channels) and explains how a digital image can be described by a matrix of pixel values. It then introduces Convolutional Neural Networks (CNNs), describing convolution layers, pooling layers, and fully‑connected layers, using VGG‑19 as an example.
Key concepts such as a 3×3 convolution kernel, stride, padding, and max‑pooling are illustrated with diagrams. The role of dropout, flattening, and the training process is also discussed.
Next, a concrete implementation is presented for classifying weather photos (sunny, cloudy, rain). The directory layout is:
mian.py # program file
[datasets] # dataset folder
|--- [cloudy]
|--- [rain]
|--- [shine]TensorFlow 2.6 is used to build the model:
import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential
model = Sequential([
layers.Rescaling(1./255, input_shape=(200, 200, 3)),
layers.Conv2D(16, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Conv2D(32, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Conv2D(64, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Dropout(0.2),
layers.Flatten(),
layers.Dense(128, activation='relu'),
layers.Dense(3)
])The training dataset is loaded with tf.keras.utils.image_dataset_from_directory , resized to 200×200, and batched at 24 images per step. The model is compiled with the Adam optimizer and sparse categorical cross‑entropy loss, then trained for 10 epochs while saving checkpoints.
train_ds = tf.keras.utils.image_dataset_from_directory(
"datasets", image_size=(200, 200), batch_size=24)
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath="tf2/checkpoint", save_weights_only=True)
model.fit(train_ds, epochs=10, callbacks=[cp_callback])After training, the model achieves over 98% accuracy. For inference, the saved weights are loaded and a new image is processed:
model.load_weights("tf2/checkpoint")
img = tf.keras.utils.load_img("test.png", target_size=(200, 200))
img_array = tf.keras.utils.img_to_array(img)
img_array = tf.expand_dims(img_array, 0)
predictions = model.predict(img_array)
score = tf.nn.softmax(predictions[0])
print("Classification {}, Score {:.2f}".format(names[np.argmax(score)], 100*np.max(score)))The article concludes with a link to a more robust GitHub repository and encourages readers to experiment with their own data without relying on paid APIs.
Rare Earth Juejin Tech Community
Juejin, a tech community that helps developers grow.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.