Artificial Intelligence 15 min read

Building an Image Classification Model with Transformers and TensorFlow: Theory, Code, and Practice

This article explains how to leverage computer‑vision techniques and deep‑learning frameworks such as Transformers and TensorFlow to build a complete image‑classification pipeline, covering the underlying RGB and CNN principles, model architecture, data preparation, training, and inference with runnable Python code.

Rare Earth Juejin Tech Community

Jul 22, 2023

Building an Image Classification Model with Transformers and TensorFlow: Theory, Code, and Practice

Although ChatGPT popularized AI, the real revenue comes from computer‑vision applications; AI can now generate clothing images, game monsters, and book illustrations. To illustrate this, the article first shows a low‑cost image‑classification demo using the transformers library.

Code example to load a pretrained ResNet model and classify a local image:

from transformers import AutoImageProcessor, ResNetForImageClassification
import torch
from PIL import Image
processor = AutoImageProcessor.from_pretrained("model")
model = ResNetForImageClassification.from_pretrained("model")
image = Image.open("pics/dog.jpeg")
inputs = processor(image, return_tensors="pt")
with torch.no_grad():
    logits = model(**inputs).logits
predicted_label = logits.argmax(-1).item()
print(model.config.id2label[predicted_label])

The project structure is shown, with a model folder containing the pretrained weights and a pics folder for test images.

After the demo, the article dives into the fundamentals of image representation (RGB channels) and explains how a digital image can be described by a matrix of pixel values. It then introduces Convolutional Neural Networks (CNNs), describing convolution layers, pooling layers, and fully‑connected layers, using VGG‑19 as an example.

Key concepts such as a 3×3 convolution kernel, stride, padding, and max‑pooling are illustrated with diagrams. The role of dropout, flattening, and the training process is also discussed.

Next, a concrete implementation is presented for classifying weather photos (sunny, cloudy, rain). The directory layout is:

mian.py   # program file
[datasets]   # dataset folder
   |--- [cloudy]
   |--- [rain]
   |--- [shine]

TensorFlow 2.6 is used to build the model:

import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential

model = Sequential([
  layers.Rescaling(1./255, input_shape=(200, 200, 3)),
  layers.Conv2D(16, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Conv2D(32, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Conv2D(64, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Dropout(0.2),
  layers.Flatten(),
  layers.Dense(128, activation='relu'),
  layers.Dense(3)
])

The training dataset is loaded with tf.keras.utils.image_dataset_from_directory, resized to 200×200, and batched at 24 images per step. The model is compiled with the Adam optimizer and sparse categorical cross‑entropy loss, then trained for 10 epochs while saving checkpoints.

train_ds = tf.keras.utils.image_dataset_from_directory(
  "datasets", image_size=(200, 200), batch_size=24)

model.compile(optimizer='adam',
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=['accuracy'])

cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath="tf2/checkpoint", save_weights_only=True)
model.fit(train_ds, epochs=10, callbacks=[cp_callback])

After training, the model achieves over 98% accuracy. For inference, the saved weights are loaded and a new image is processed:

model.load_weights("tf2/checkpoint")
img = tf.keras.utils.load_img("test.png", target_size=(200, 200))
img_array = tf.keras.utils.img_to_array(img)
img_array = tf.expand_dims(img_array, 0)
predictions = model.predict(img_array)
score = tf.nn.softmax(predictions[0])
print("Classification {}, Score {:.2f}".format(names[np.argmax(score)], 100*np.max(score)))

The article concludes with a link to a more robust GitHub repository and encourages readers to experiment with their own data without relying on paid APIs.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

CNN image classification Python TensorFlow Transformers

Written by

Rare Earth Juejin Tech Community

Juejin, a tech community that helps developers grow.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.