Deploy Efficient Text Classification on Android with TensorFlow Lite
This guide walks you through the end‑to‑end process of building, training, converting, and deploying a TensorFlow Lite text‑classification model on Android, covering data preparation, model selection, performance trade‑offs, and integration using the TFLite Task Library.
TensorFlow Lite Overview
TensorFlow Lite (TFL) is Google’s lightweight machine‑learning framework designed for resource‑constrained mobile and embedded devices. It provides a compact model format and an optimized inference engine that runs efficiently on Android, iOS, and Linux.
Machine Learning for Text Classification
Text classification is widely used for spam‑SMS detection and negative‑review filtering. Effective models require a sufficiently large, manually labeled dataset and appropriate algorithms such as Naïve Bayes, SVM, or deep‑learning networks. Feature engineering (keywords, sentiment cues, syntactic structures) and regular model updates are essential for maintaining accuracy.
Advantages of TensorFlow Lite
High‑performance inference : leverages device hardware accelerators (GPU, DSP) for fast predictions.
Lightweight models : model‑size reduction techniques (quantization, pruning) keep files small for storage‑limited devices.
Flexible deployment : supports Android, iOS, embedded, and edge devices via a unified API.
Developer‑friendly tools : comprehensive documentation and conversion utilities simplify the workflow.
TFL Usage Steps
The core pipeline consists of four stages:
train model → save model → convert model → run inference. This flow is illustrated in the diagram below.
Model Selection
For image‑based tasks, MobileNet‑V2 is a common choice. For text classification, TFL Model Maker supports several architectures:
MobileBERT – optimized for mobile devices but requires longer training.
Average Word‑Embedding – fast, lightweight, suitable when high accuracy is not critical.
BERT‑Base – higher accuracy at the cost of larger model size.
Local Model Training
Set up the environment (Python, TFLite Model Maker, and supporting libraries):
sudo apt -y install libportaudio2
pip install -q tflite-model-maker
pip uninstall tflite_support_nightly
pip install tflite_support_nightlyImport required packages:
import numpy as np
import os
from tflite_model_maker import model_spec, text_classifier
from tflite_model_maker.config import ExportFormat
from tflite_model_maker.text_classifier import AverageWordVecSpec, DataLoader
import tensorflow as tfDownload and prepare the SST‑2 dataset (GLUE benchmark) containing 67,349 training and 872 test movie‑review sentences. Convert TSV to CSV and replace label values with human‑readable strings:
data_dir = tf.keras.utils.get_file(
fname='SST-2.zip',
origin='https://dl.fbaipublicfiles.com/glue/data/SST-2.zip',
extract=True)
data_dir = os.path.join(os.path.dirname(data_dir), 'SST-2')
# Convert TSV to CSV and map labels
import pandas as pd
def replace_label(original_file, new_file):
df = pd.read_csv(original_file, sep='\t')
label_map = {0: 'negative', 1: 'positive'}
df.replace({'label': label_map}, inplace=True)
df.to_csv(new_file, index=False)
replace_label(os.path.join(data_dir, 'train.tsv'), 'train.csv')
replace_label(os.path.join(data_dir, 'dev.tsv'), 'dev.csv')Select a model specification and load the data:
# Choose average word‑embedding spec
spec = model_spec.get('average_word_vec')
train_data = DataLoader.from_csv(
filename='train.csv',
text_column='sentence',
label_column='label',
model_spec=spec,
is_training=True)
test_data = DataLoader.from_csv(
filename='dev.csv',
text_column='sentence',
label_column='label',
model_spec=spec,
is_training=False)Train the model (10 epochs by default):
model = text_classifier.create(train_data, model_spec=spec, epochs=10)Switch to MobileBERT if higher accuracy is needed:
mb_spec = model_spec.get('mobilebert_classifier')
model = text_classifier.create(train_data, model_spec=mb_spec, epochs=10)Training logs show progressive loss reduction and accuracy improvement, reaching ~86% accuracy after 10 epochs.
Model Evaluation
Evaluate the exported TFLite model on the test set:
accuracy = model.evaluate_tflite('mobilebert/model.tflite', test_data)
print('TFLite model accuracy:', accuracy)Typical output: {'accuracy': 0.9048}.
Exporting the Model
Export the trained model in the desired format (default for average word‑embedding is a floating‑point TFLite file): model.export(export_dir='average_word_vec') Supported export formats include ExportFormat.TFLITE, ExportFormat.LABEL, ExportFormat.VOCAB, and ExportFormat.SAVED_MODEL.
Android Integration
Add the following Gradle dependencies:
implementation 'org.tensorflow:tensorflow-lite-task-text:0.4.0'
implementation 'org.tensorflow:tensorflow-lite-gpu-delegate-plugin:0.4.0'
implementation 'org.tensorflow:tensorflow-lite-gpu:2.9.0'Initialize the classifier using the TFLite Task Library:
val baseOptions = BaseOptions.builder()
.setNumThreads(4)
.build()
val options = NLClassifier.NLClassifierOptions.builder()
.setBaseOptions(baseOptions)
.build()
val classifier = NLClassifier.createFromFileAndOptions(context, "average_word_vec/model.tflite", options)Run inference on a sentence:
val result = classifier.classify(sentence)
val topCategory = result.classifications[0].categories.maxByOrNull { it.score }
println("truth: $label → predict: ${topCategory?.categoryName}")Result Demonstration
On‑device screenshots show the average‑word‑vector model classifying a positive review with ~64% confidence in ~1 ms, while the same input using MobileBERT yields higher confidence but takes ~173 ms. For a negative review, the lightweight model mistakenly predicts “positive” (64% confidence), whereas MobileBERT correctly identifies it as negative (78% confidence), illustrating the trade‑off between speed and accuracy.
Abbreviations
Abbreviation
Full Form
Chinese
TFL
TensorFlow Lite
/
SVM
Support Vector Machine
支持向量机
CNN
Convolutional Neural Network
卷积神经网络
GPU
Graphics Processing Unit
图形处理单元
DSP
Digital Signal Processor
数字信号处理器
BERT
Bidirectional Encoder Representations from Transformers
Transformers 的双向编码器表示
ML
Machine Learning
机器学习
SST‑2
Stanford Sentiment Treebank Version 2
标准情感数据集第2版
GLUE
General Language Understanding Evaluation
通用语言理解评估基准
CSV
Comma‑Separated Values
逗号分隔值
TSV
Tab‑Separated Values
制表符分隔值
OPPO Amber Lab
Centered on user data security and privacy, we conduct research and open our tech capabilities to developers, building an information‑security fortress for partners and users and safeguarding OPPO device security.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
