Artificial Intelligence 9 min read

Deploying YOLO V3 with TensorFlow Serving: Environment Setup, Model Conversion, Service Deployment, and Performance Comparison

This article explains how to prepare the Docker environment, install TensorFlow Serving (CPU and GPU versions), convert a YOLO V3 checkpoint to SavedModel, deploy the model as a service, warm‑up and manage versions, invoke it via gRPC and HTTP, and compare CPU versus GPU inference performance.

360 Quality & Efficiency

Dec 6, 2019

Deploying YOLO V3 with TensorFlow Serving: Environment Setup, Model Conversion, Service Deployment, and Performance Comparison

Background In UI automation testing, recognizing interface controls is fundamental; classic computer‑vision models like YOLO can be transferred to this domain. After training, the model must be served in production, and TensorFlow Serving provides a simple way to manage the model lifecycle.

Environment Preparation

TensorFlow Serving is recommended to run inside a Docker container, so Docker must be installed first.

Install TensorFlow Serving (CPU version)

Pull the CPU image: docker pull tensorflow/serving Run the container:

docker run -p 8500:8500 -p 8501:8501 --mount "type=bind,source=/home/test/yolo,target=/models/yolo" -e MODEL_NAME=yolo -t tensorflow/serving:latest

Install TensorFlow Serving (GPU version)

Install nvidia‑docker: sudo apt-get install -y nvidia-docker2 Pull the GPU image: docker pull tensorflow/serving:latest-devel-gpu Run the GPU container:

docker run -p 8500:8500 -p 8501:8501 --runtime=nvidia --mount "type=bind,source=/home/test/yolo,target=/models/yolo" -e MODEL_NAME=yolo -t tensorflow/serving:latest-gpu

YOLO Model Format Conversion

The original YOLO V3 TensorFlow checkpoint (ckpt) must be converted to the SavedModel format required by TensorFlow Serving. The conversion script builds a session, restores the checkpoint, defines input/output signatures, and saves the model:

with tf.Session() as sess:
    string_inp = tf.placeholder(tf.string, shape=(None,))
    imgs_map = tf.map_fn(tf.image.decode_image, string_inp, dtype=tf.uint8)
    imgs_map.set_shape((None, None, None, 3))
    imgs = tf.image.resize_images(imgs_map, [416, 416])
    imgs = tf.reshape(imgs, (-1, 416, 416, 3))
    img_float = tf.cast(imgs, dtype=tf.float32) / 255
    yolo_model = yolov3(num_class, anchors)
    with tf.variable_scope('yolov3'):
        pred_feature_maps = yolo_model.forward(img_float, False)
    pred_boxes, pred_confs, pred_probs = yolo_model.predict(pred_feature_maps)
    pred_scores = pred_confs * pred_probs
    boxes, scores, labels = gpu_nms(pred_boxes, pred_scores, num_class, max_boxes=200, score_thresh=0.3, nms_thresh=0.45)
    saver = tf.train.Saver()
    saver.restore(sess, "./data/darknet_weights/yolov3.ckpt")
    builder = tf.saved_model.builder.SavedModelBuilder(export_path)
    tensor_info_input = tf.saved_model.utils.build_tensor_info(string_inp)
    tensor_info_output1 = tf.saved_model.utils.build_tensor_info(boxes)
    tensor_info_output2 = tf.saved_model.utils.build_tensor_info(scores)
    tensor_info_output3 = tf.saved_model.utils.build_tensor_info(labels)
    prediction_signature = tf.saved_model.signature_def_utils.build_signature_def(
        inputs={'images': tensor_info_input},
        outputs={"boxes": tensor_info_output1, "scores": tensor_info_output2, "labels": tensor_info_output3},
        method_name=tf.saved_model.signature_constants.PREDICT_METHOD_NAME)
    builder.add_meta_graph_and_variables(sess, [tf.saved_model.tag_constants.SERVING],
                                         signature_def_map={'predict_images': prediction_signature,
                                                          tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY: prediction_signature})
    builder.save()
    print('Done exporting!')

YOLO Service Deployment and Warm‑up

Because TensorFlow lazily loads components, the first prediction request can be slow. A warm‑up request is generated and stored as tf_serving_warmup_requests.TFRecord to pre‑load the model:

# coding=utf-8
import tensorflow as tf
from tensorflow_serving.apis import model_pb2, predict_pb2, prediction_log_pb2

def main():
    data = open('./data/dog.jpg', 'rb').read()
    with tf.io.TFRecordWriter("tf_serving_warmup_requests") as writer:
        request = predict_pb2.PredictRequest(
            model_spec=model_pb2.ModelSpec(name="yolo", signature_name='predict_images'),
            inputs={"images": tf.make_tensor_proto([data])})
        log = prediction_log_pb2.PredictionLog(predict_log=prediction_log_pb2.PredictLog(request=request))
        writer.write(log.SerializeToString())

if __name__ == "__main__":
    main()

The generated file is placed in the model’s assets.extra directory; TensorFlow Serving automatically loads it on startup.

Version Management

TensorFlow Serving can host multiple model versions. Adding a new version directory triggers automatic loading of the new version and unloading of the old one.

Service Invocation

Two calling methods are provided:

gRPC (default port 8500):

def test_grpc(img_path):
    channel = grpc.insecure_channel('10.18.131.58:8500')
    stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
    data = open(img_path, 'rb').read()
    request = predict_pb2.PredictRequest()
    request.model_spec.name = "yolo"
    request.model_spec.signature_name = "predict_images"
    request.inputs['images'].CopyFrom(tf.make_tensor_proto([data]))
    result = stub.Predict(request, 10.0)
    return result

def test_http(img_path):
    with open(img_path, "rb") as image_file:
        encoded_string = base64.b64encode(image_file.read())
    params = {"inputs": {"images": [{"b64": encoded_string}]}}
    data = json.dumps(params)
    rep = requests.post("http://10.18.131.58:8501/v1/models/yolo:predict", data=data)
    return rep.text

CPU & GPU Performance Comparison

Using GPU acceleration, a single request is processed in about 50 ms, whereas the CPU‑only setup takes roughly 280 ms, yielding a 5–6× speed improvement.

References

https://www.tensorflow.org/tfx/guide/serving

https://www.tensorflow.org/guide/saved_model

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Docker Python AI Model Deployment GPU TensorFlow Serving YOLO

Written by

360 Quality & Efficiency

360 Quality & Efficiency focuses on seamlessly integrating quality and efficiency in R&D, sharing 360’s internal best practices with industry peers to foster collaboration among Chinese enterprises and drive greater efficiency value.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.