Deploying YOLO V3 with TensorFlow Serving: Environment Setup, Model Conversion, Service Deployment, and Performance Comparison
This article explains how to prepare the Docker environment, install TensorFlow Serving (CPU and GPU versions), convert a YOLO V3 checkpoint to SavedModel, deploy the model as a service, warm‑up and manage versions, invoke it via gRPC and HTTP, and compare CPU versus GPU inference performance.
Background In UI automation testing, recognizing interface controls is fundamental; classic computer‑vision models like YOLO can be transferred to this domain. After training, the model must be served in production, and TensorFlow Serving provides a simple way to manage the model lifecycle.
Environment Preparation
TensorFlow Serving is recommended to run inside a Docker container, so Docker must be installed first.
Install TensorFlow Serving (CPU version)
Pull the CPU image:
docker pull tensorflow/servingRun the container:
docker run -p 8500:8500 -p 8501:8501 --mount "type=bind,source=/home/test/yolo,target=/models/yolo" -e MODEL_NAME=yolo -t tensorflow/serving:latestInstall TensorFlow Serving (GPU version)
Install nvidia‑docker:
sudo apt-get install -y nvidia-docker2Pull the GPU image:
docker pull tensorflow/serving:latest-devel-gpuRun the GPU container:
docker run -p 8500:8500 -p 8501:8501 --runtime=nvidia --mount "type=bind,source=/home/test/yolo,target=/models/yolo" -e MODEL_NAME=yolo -t tensorflow/serving:latest-gpuYOLO Model Format Conversion
The original YOLO V3 TensorFlow checkpoint (ckpt) must be converted to the SavedModel format required by TensorFlow Serving. The conversion script builds a session, restores the checkpoint, defines input/output signatures, and saves the model:
with tf.Session() as sess:
string_inp = tf.placeholder(tf.string, shape=(None,))
imgs_map = tf.map_fn(tf.image.decode_image, string_inp, dtype=tf.uint8)
imgs_map.set_shape((None, None, None, 3))
imgs = tf.image.resize_images(imgs_map, [416, 416])
imgs = tf.reshape(imgs, (-1, 416, 416, 3))
img_float = tf.cast(imgs, dtype=tf.float32) / 255
yolo_model = yolov3(num_class, anchors)
with tf.variable_scope('yolov3'):
pred_feature_maps = yolo_model.forward(img_float, False)
pred_boxes, pred_confs, pred_probs = yolo_model.predict(pred_feature_maps)
pred_scores = pred_confs * pred_probs
boxes, scores, labels = gpu_nms(pred_boxes, pred_scores, num_class, max_boxes=200, score_thresh=0.3, nms_thresh=0.45)
saver = tf.train.Saver()
saver.restore(sess, "./data/darknet_weights/yolov3.ckpt")
builder = tf.saved_model.builder.SavedModelBuilder(export_path)
tensor_info_input = tf.saved_model.utils.build_tensor_info(string_inp)
tensor_info_output1 = tf.saved_model.utils.build_tensor_info(boxes)
tensor_info_output2 = tf.saved_model.utils.build_tensor_info(scores)
tensor_info_output3 = tf.saved_model.utils.build_tensor_info(labels)
prediction_signature = tf.saved_model.signature_def_utils.build_signature_def(
inputs={'images': tensor_info_input},
outputs={"boxes": tensor_info_output1, "scores": tensor_info_output2, "labels": tensor_info_output3},
method_name=tf.saved_model.signature_constants.PREDICT_METHOD_NAME)
builder.add_meta_graph_and_variables(sess, [tf.saved_model.tag_constants.SERVING],
signature_def_map={'predict_images': prediction_signature,
tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY: prediction_signature})
builder.save()
print('Done exporting!')YOLO Service Deployment and Warm‑up
Because TensorFlow lazily loads components, the first prediction request can be slow. A warm‑up request is generated and stored as tf_serving_warmup_requests.TFRecord to pre‑load the model:
# coding=utf-8
import tensorflow as tf
from tensorflow_serving.apis import model_pb2, predict_pb2, prediction_log_pb2
def main():
data = open('./data/dog.jpg', 'rb').read()
with tf.io.TFRecordWriter("tf_serving_warmup_requests") as writer:
request = predict_pb2.PredictRequest(
model_spec=model_pb2.ModelSpec(name="yolo", signature_name='predict_images'),
inputs={"images": tf.make_tensor_proto([data])})
log = prediction_log_pb2.PredictionLog(predict_log=prediction_log_pb2.PredictLog(request=request))
writer.write(log.SerializeToString())
if __name__ == "__main__":
main()The generated file is placed in the model’s assets.extra directory; TensorFlow Serving automatically loads it on startup.
Version Management
TensorFlow Serving can host multiple model versions. Adding a new version directory triggers automatic loading of the new version and unloading of the old one.
Service Invocation
Two calling methods are provided:
gRPC (default port 8500):
def test_grpc(img_path):
channel = grpc.insecure_channel('10.18.131.58:8500')
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
data = open(img_path, 'rb').read()
request = predict_pb2.PredictRequest()
request.model_spec.name = "yolo"
request.model_spec.signature_name = "predict_images"
request.inputs['images'].CopyFrom(tf.make_tensor_proto([data]))
result = stub.Predict(request, 10.0)
return result
def test_http(img_path):
with open(img_path, "rb") as image_file:
encoded_string = base64.b64encode(image_file.read())
params = {"inputs": {"images": [{"b64": encoded_string}]}}
data = json.dumps(params)
rep = requests.post("http://10.18.131.58:8501/v1/models/yolo:predict", data=data)
return rep.textCPU & GPU Performance Comparison
Using GPU acceleration, a single request is processed in about 50 ms, whereas the CPU‑only setup takes roughly 280 ms, yielding a 5–6× speed improvement.
References
https://www.tensorflow.org/tfx/guide/serving
https://www.tensorflow.org/guide/saved_model
360 Quality & Efficiency
360 Quality & Efficiency focuses on seamlessly integrating quality and efficiency in R&D, sharing 360’s internal best practices with industry peers to foster collaboration among Chinese enterprises and drive greater efficiency value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.