Optimizing Machine Learning Models for Edge Devices with TensorFlow Lite

This article explains how to convert a TensorFlow image‑classification model to TensorFlow Lite, apply different quantization techniques, benchmark the resulting models on a Raspberry Pi 4, and compare latency, size, and accuracy to demonstrate the trade‑offs of edge AI deployment.

Code DAO
Code DAO
Code DAO
Optimizing Machine Learning Models for Edge Devices with TensorFlow Lite

TensorFlow Lite (TF Lite) is an open‑source, cross‑platform framework that enables running machine‑learning models on mobile, embedded, and IoT devices. The article first introduces TF Lite and the two ways to obtain a TF Lite model, focusing on converting an existing TensorFlow model.

Advantages of TensorFlow Lite on Edge Devices

Low latency because inference occurs locally.

Data privacy since no network transmission is required.

No dependency on internet connectivity.

Small model footprint suitable for resource‑constrained hardware.

Reduced power consumption.

Building a Baseline Image Classifier

The example trains a simple cat‑vs‑dog classifier using the cats_vs_dogs dataset from tensorflow_datasets. The data is split 70%/20%/10% for training, validation, and testing. Images are resized to 224×224 and batched with a size of 16. Prefetching with a buffer of 10 improves pipeline throughput.

# Importing necessary libraries and packages.
import os
import numpy as np
import tensorflow as tf
from tensorflow import keras
import tensorflow_datasets as tfds
from tensorflow.keras.models import Model
import tensorflow_model_optimization as tfmot
from tensorflow.keras.layers import Dropout, Dense, BatchNormalization
%load_ext tensorboard

# Loading the CatvsDog dataset.
(train_ds, val_ds, test_ds), info = tfds.load('cats_vs_dogs',
    split=['train[:70%]', 'train[70%:90%]', 'train[90%:]'],
    shuffle_files=True, as_supervised=True, with_info=True)

# Inspect dataset information.
print('Number of Classes: ' + str(info.features['label'].num_classes))
print('Classes : ' + str(info.features['label'].names))
NUM_TRAIN_IMAGES = tf.data.experimental.cardinality(train_ds).numpy()
print('Training Images: ' + str(NUM_TRAIN_IMAGES))
NUM_VAL_IMAGES = tf.data.experimental.cardinality(val_ds).numpy()
print('Validation Images: ' + str(NUM_VAL_IMAGES))
NUM_TEST_IMAGES = tf.data.experimental.cardinality(test_ds).numpy()
print('Testing Images: ' + str(NUM_TEST_IMAGES))

EfficientNet‑B0 pretrained on ImageNet is used as the base model (include_top=False, input shape 224×224×3, pooling='max'). Additional dense, batch‑normalization, and dropout layers are added, and all layers are set trainable.

# Defining the model architecture.
efnet = tf.keras.applications.EfficientNetB0(
    include_top=False, weights='imagenet', input_shape=(224, 224, 3), pooling='max')
for layer in efnet.layers:
    layer.trainable = True

x = Dense(512, activation='relu')(efnet.output)
x = BatchNormalization()(x)
x = Dense(64, activation='relu')(x)
x = Dropout(0.2)(x)
predictions = Dense(2, activation='softmax')(x)
model = Model(inputs=efnet.input, outputs=predictions)

# Compile the model.
model.compile(optimizer=tf.keras.optimizers.Adam(0.0001),
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
    metrics=['accuracy'])
model.summary()

Training runs for 15 epochs with the prepared datasets. After training, the baseline Keras model achieves 98.53 % test accuracy.

# Train the model.
model.fit(train_ds, epochs=15,
    steps_per_epoch=(len(train_ds)//batch_size),
    validation_data=val_ds,
    validation_steps=(len(val_ds)//batch_size),
    shuffle=False, callbacks=callback)

# Evaluate the model.
_, baseline_model_accuracy = model.evaluate(test_ds, verbose=0)
print('Baseline test accuracy:', baseline_model_accuracy*100)

Quantization Techniques

Three TF Lite quantization methods are explored: Float‑16, Dynamic Range (8‑bit weights), and Integer (full 8‑bit for weights and activations). Each method converts the Keras model with tf.lite.TFLiteConverter, saves the .tflite file, and evaluates accuracy using a custom evaluate() function that runs inference with the TF Lite interpreter.

# Evaluation helper.
def evaluate(interpreter):
    prediction = []
    input_index = interpreter.get_input_details()[0]['index']
    output_index = interpreter.get_output_details()[0]['index']
    input_format = interpreter.get_input_details()[0]['dtype']
    for i, test_image in enumerate(test_images):
        if i % 100 == 0:
            print(f'Evaluated on {i} results so far.')
        test_image = np.expand_dims(test_image, axis=0).astype(input_format)
        interpreter.set_tensor(input_index, test_image)
        interpreter.invoke()
        output = interpreter.get_tensor(output_index)
        predicted_label = np.argmax(output[0])
        prediction.append(predicted_label)
    prediction = np.array(prediction)
    accuracy = (prediction == test_labels).mean()
    return accuracy

Float‑16 Quantization

converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.float16]
tflite_fp16_model = converter.convert()
with open('/content/fp_16_model.tflite', 'wb') as f:
    f.write(tflite_fp16_model)

interpreter = tf.lite.Interpreter('/content/fp_16_model.tflite')
interpreter.allocate_tensors()
fp16_acc = evaluate(interpreter)
print('Float 16 Quantized TFLite Model Test Accuracy:', fp16_acc*100)
print('Baseline Keras Model Test Accuracy:', baseline_model_accuracy*100)

Result: 98.58 % vs. 98.53 % baseline.

Dynamic Range Quantization

converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_quant_model = converter.convert()
with open('/content/dynamic_quant_model.tflite', 'wb') as f:
    f.write(tflite_quant_model)

interpreter = tf.lite.Interpreter('/content/dynamic_quant_model.tflite')
interpreter.allocate_tensors()
dynamic_acc = evaluate(interpreter)
print('Dynamically Quantized TFLite Model Test Accuracy:', dynamic_acc*100)
print('Baseline Keras Model Test Accuracy:', baseline_model_accuracy*100)

Result: 98.15 % vs. 98.53 % baseline.

Integer Quantization

def representative_data_gen():
    for input_value in tf.data.Dataset.from_tensor_slices(test_images).take(100):
        yield [input_value]

converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_data_gen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
int_quant_model = converter.convert()
with open('/content/int_quant_model.tflite', 'wb') as f:
    f.write(int_quant_model)

interpreter = tf.lite.Interpreter('/content/int_quant_model.tflite')
interpreter.allocate_tensors()
int_acc = evaluate(interpreter)
print('Integer Quantized TFLite Model Test Accuracy:', int_acc*100)
print('Baseline Keras Model Test Accuracy:', baseline_model_accuracy*100)

Result: 92.82 % vs. 98.53 % baseline, showing a noticeable accuracy drop.

Raspberry Pi 4 Evaluation

All three TF Lite models were benchmarked on a Raspberry Pi 4 (4 GB RAM). Accuracy, model size, and inference time were measured over 100 random test images.

Float‑16 quantization slightly improves accuracy; dynamic range shows a minor drop; integer quantization reduces accuracy by about 6 %.

Model size shrinks 6× for Float‑16, and about 10× for both dynamic range and integer quantization.

Inference latency improves roughly 2.5× for Float‑16 and dynamic range models, and 3.5× for the integer‑quantized model.

Conclusion

TensorFlow Lite enables substantial reductions in model size and latency on edge hardware. Float‑16 quantization offers the best accuracy‑preserving trade‑off, while integer quantization provides the greatest speedup at the cost of noticeable accuracy loss, making it suitable for ultra‑low‑power microcontrollers.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonEdge AITensorFlow LiteRaspberry PiModel QuantizationEfficientNet
Code DAO
Written by

Code DAO

We deliver AI algorithm tutorials and the latest news, curated by a team of researchers from Peking University, Shanghai Jiao Tong University, Central South University, and leading AI companies such as Huawei, Kuaishou, and SenseTime. Join us in the AI alchemy—making life better!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.