Artificial Intelligence 9 min read

Deploying YOLO V3 with TensorFlow Serving: Environment Setup, Model Conversion, Service Deployment, and Performance Comparison

This article explains how to prepare the Docker environment, install TensorFlow Serving (CPU and GPU versions), convert a YOLO V3 checkpoint to SavedModel, deploy the model as a service, warm‑up and manage versions, invoke it via gRPC and HTTP, and compare CPU versus GPU inference performance.

360 Quality & Efficiency
360 Quality & Efficiency
360 Quality & Efficiency
Deploying YOLO V3 with TensorFlow Serving: Environment Setup, Model Conversion, Service Deployment, and Performance Comparison

Background In UI automation testing, recognizing interface controls is fundamental; classic computer‑vision models like YOLO can be transferred to this domain. After training, the model must be served in production, and TensorFlow Serving provides a simple way to manage the model lifecycle.

Environment Preparation

TensorFlow Serving is recommended to run inside a Docker container, so Docker must be installed first.

Install TensorFlow Serving (CPU version)

Pull the CPU image:

docker pull tensorflow/serving

Run the container:

docker run -p 8500:8500 -p 8501:8501 --mount "type=bind,source=/home/test/yolo,target=/models/yolo" -e MODEL_NAME=yolo -t tensorflow/serving:latest

Install TensorFlow Serving (GPU version)

Install nvidia‑docker:

sudo apt-get install -y nvidia-docker2

Pull the GPU image:

docker pull tensorflow/serving:latest-devel-gpu

Run the GPU container:

docker run -p 8500:8500 -p 8501:8501 --runtime=nvidia --mount "type=bind,source=/home/test/yolo,target=/models/yolo" -e MODEL_NAME=yolo -t tensorflow/serving:latest-gpu

YOLO Model Format Conversion

The original YOLO V3 TensorFlow checkpoint (ckpt) must be converted to the SavedModel format required by TensorFlow Serving. The conversion script builds a session, restores the checkpoint, defines input/output signatures, and saves the model:

with tf.Session() as sess:
    string_inp = tf.placeholder(tf.string, shape=(None,))
    imgs_map = tf.map_fn(tf.image.decode_image, string_inp, dtype=tf.uint8)
    imgs_map.set_shape((None, None, None, 3))
    imgs = tf.image.resize_images(imgs_map, [416, 416])
    imgs = tf.reshape(imgs, (-1, 416, 416, 3))
    img_float = tf.cast(imgs, dtype=tf.float32) / 255
    yolo_model = yolov3(num_class, anchors)
    with tf.variable_scope('yolov3'):
        pred_feature_maps = yolo_model.forward(img_float, False)
    pred_boxes, pred_confs, pred_probs = yolo_model.predict(pred_feature_maps)
    pred_scores = pred_confs * pred_probs
    boxes, scores, labels = gpu_nms(pred_boxes, pred_scores, num_class, max_boxes=200, score_thresh=0.3, nms_thresh=0.45)
    saver = tf.train.Saver()
    saver.restore(sess, "./data/darknet_weights/yolov3.ckpt")
    builder = tf.saved_model.builder.SavedModelBuilder(export_path)
    tensor_info_input = tf.saved_model.utils.build_tensor_info(string_inp)
    tensor_info_output1 = tf.saved_model.utils.build_tensor_info(boxes)
    tensor_info_output2 = tf.saved_model.utils.build_tensor_info(scores)
    tensor_info_output3 = tf.saved_model.utils.build_tensor_info(labels)
    prediction_signature = tf.saved_model.signature_def_utils.build_signature_def(
        inputs={'images': tensor_info_input},
        outputs={"boxes": tensor_info_output1, "scores": tensor_info_output2, "labels": tensor_info_output3},
        method_name=tf.saved_model.signature_constants.PREDICT_METHOD_NAME)
    builder.add_meta_graph_and_variables(sess, [tf.saved_model.tag_constants.SERVING],
                                         signature_def_map={'predict_images': prediction_signature,
                                                          tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY: prediction_signature})
    builder.save()
    print('Done exporting!')

YOLO Service Deployment and Warm‑up

Because TensorFlow lazily loads components, the first prediction request can be slow. A warm‑up request is generated and stored as tf_serving_warmup_requests.TFRecord to pre‑load the model:

# coding=utf-8
import tensorflow as tf
from tensorflow_serving.apis import model_pb2, predict_pb2, prediction_log_pb2

def main():
    data = open('./data/dog.jpg', 'rb').read()
    with tf.io.TFRecordWriter("tf_serving_warmup_requests") as writer:
        request = predict_pb2.PredictRequest(
            model_spec=model_pb2.ModelSpec(name="yolo", signature_name='predict_images'),
            inputs={"images": tf.make_tensor_proto([data])})
        log = prediction_log_pb2.PredictionLog(predict_log=prediction_log_pb2.PredictLog(request=request))
        writer.write(log.SerializeToString())

if __name__ == "__main__":
    main()

The generated file is placed in the model’s assets.extra directory; TensorFlow Serving automatically loads it on startup.

Version Management

TensorFlow Serving can host multiple model versions. Adding a new version directory triggers automatic loading of the new version and unloading of the old one.

Service Invocation

Two calling methods are provided:

gRPC (default port 8500):

def test_grpc(img_path):
    channel = grpc.insecure_channel('10.18.131.58:8500')
    stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
    data = open(img_path, 'rb').read()
    request = predict_pb2.PredictRequest()
    request.model_spec.name = "yolo"
    request.model_spec.signature_name = "predict_images"
    request.inputs['images'].CopyFrom(tf.make_tensor_proto([data]))
    result = stub.Predict(request, 10.0)
    return result

def test_http(img_path):
    with open(img_path, "rb") as image_file:
        encoded_string = base64.b64encode(image_file.read())
    params = {"inputs": {"images": [{"b64": encoded_string}]}}
    data = json.dumps(params)
    rep = requests.post("http://10.18.131.58:8501/v1/models/yolo:predict", data=data)
    return rep.text

CPU & GPU Performance Comparison

Using GPU acceleration, a single request is processed in about 50 ms, whereas the CPU‑only setup takes roughly 280 ms, yielding a 5–6× speed improvement.

References

https://www.tensorflow.org/tfx/guide/serving

https://www.tensorflow.org/guide/saved_model

DockerPythonAIModel DeploymentGPUTensorFlow ServingYOLO
360 Quality & Efficiency
Written by

360 Quality & Efficiency

360 Quality & Efficiency focuses on seamlessly integrating quality and efficiency in R&D, sharing 360’s internal best practices with industry peers to foster collaboration among Chinese enterprises and drive greater efficiency value.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.