Deploying YOLO V3 with TensorFlow Serving: Environment Setup, Model Conversion, Service Deployment, and Performance Comparison
This article explains how to prepare the Docker environment, install TensorFlow Serving (CPU and GPU versions), convert a YOLO V3 checkpoint to SavedModel, deploy the model as a service, warm‑up and manage versions, invoke it via gRPC and HTTP, and compare CPU versus GPU inference performance.
Background In UI automation testing, recognizing interface controls is fundamental; classic computer‑vision models like YOLO can be transferred to this domain. After training, the model must be served in production, and TensorFlow Serving provides a simple way to manage the model lifecycle.
Environment Preparation
TensorFlow Serving is recommended to run inside a Docker container, so Docker must be installed first.
Install TensorFlow Serving (CPU version)
Pull the CPU image: docker pull tensorflow/serving Run the container:
docker run -p 8500:8500 -p 8501:8501 --mount "type=bind,source=/home/test/yolo,target=/models/yolo" -e MODEL_NAME=yolo -t tensorflow/serving:latestInstall TensorFlow Serving (GPU version)
Install nvidia‑docker: sudo apt-get install -y nvidia-docker2 Pull the GPU image: docker pull tensorflow/serving:latest-devel-gpu Run the GPU container:
docker run -p 8500:8500 -p 8501:8501 --runtime=nvidia --mount "type=bind,source=/home/test/yolo,target=/models/yolo" -e MODEL_NAME=yolo -t tensorflow/serving:latest-gpuYOLO Model Format Conversion
The original YOLO V3 TensorFlow checkpoint (ckpt) must be converted to the SavedModel format required by TensorFlow Serving. The conversion script builds a session, restores the checkpoint, defines input/output signatures, and saves the model:
with tf.Session() as sess:
string_inp = tf.placeholder(tf.string, shape=(None,))
imgs_map = tf.map_fn(tf.image.decode_image, string_inp, dtype=tf.uint8)
imgs_map.set_shape((None, None, None, 3))
imgs = tf.image.resize_images(imgs_map, [416, 416])
imgs = tf.reshape(imgs, (-1, 416, 416, 3))
img_float = tf.cast(imgs, dtype=tf.float32) / 255
yolo_model = yolov3(num_class, anchors)
with tf.variable_scope('yolov3'):
pred_feature_maps = yolo_model.forward(img_float, False)
pred_boxes, pred_confs, pred_probs = yolo_model.predict(pred_feature_maps)
pred_scores = pred_confs * pred_probs
boxes, scores, labels = gpu_nms(pred_boxes, pred_scores, num_class, max_boxes=200, score_thresh=0.3, nms_thresh=0.45)
saver = tf.train.Saver()
saver.restore(sess, "./data/darknet_weights/yolov3.ckpt")
builder = tf.saved_model.builder.SavedModelBuilder(export_path)
tensor_info_input = tf.saved_model.utils.build_tensor_info(string_inp)
tensor_info_output1 = tf.saved_model.utils.build_tensor_info(boxes)
tensor_info_output2 = tf.saved_model.utils.build_tensor_info(scores)
tensor_info_output3 = tf.saved_model.utils.build_tensor_info(labels)
prediction_signature = tf.saved_model.signature_def_utils.build_signature_def(
inputs={'images': tensor_info_input},
outputs={"boxes": tensor_info_output1, "scores": tensor_info_output2, "labels": tensor_info_output3},
method_name=tf.saved_model.signature_constants.PREDICT_METHOD_NAME)
builder.add_meta_graph_and_variables(sess, [tf.saved_model.tag_constants.SERVING],
signature_def_map={'predict_images': prediction_signature,
tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY: prediction_signature})
builder.save()
print('Done exporting!')YOLO Service Deployment and Warm‑up
Because TensorFlow lazily loads components, the first prediction request can be slow. A warm‑up request is generated and stored as tf_serving_warmup_requests.TFRecord to pre‑load the model:
# coding=utf-8
import tensorflow as tf
from tensorflow_serving.apis import model_pb2, predict_pb2, prediction_log_pb2
def main():
data = open('./data/dog.jpg', 'rb').read()
with tf.io.TFRecordWriter("tf_serving_warmup_requests") as writer:
request = predict_pb2.PredictRequest(
model_spec=model_pb2.ModelSpec(name="yolo", signature_name='predict_images'),
inputs={"images": tf.make_tensor_proto([data])})
log = prediction_log_pb2.PredictionLog(predict_log=prediction_log_pb2.PredictLog(request=request))
writer.write(log.SerializeToString())
if __name__ == "__main__":
main()The generated file is placed in the model’s assets.extra directory; TensorFlow Serving automatically loads it on startup.
Version Management
TensorFlow Serving can host multiple model versions. Adding a new version directory triggers automatic loading of the new version and unloading of the old one.
Service Invocation
Two calling methods are provided:
gRPC (default port 8500):
def test_grpc(img_path):
channel = grpc.insecure_channel('10.18.131.58:8500')
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
data = open(img_path, 'rb').read()
request = predict_pb2.PredictRequest()
request.model_spec.name = "yolo"
request.model_spec.signature_name = "predict_images"
request.inputs['images'].CopyFrom(tf.make_tensor_proto([data]))
result = stub.Predict(request, 10.0)
return result
def test_http(img_path):
with open(img_path, "rb") as image_file:
encoded_string = base64.b64encode(image_file.read())
params = {"inputs": {"images": [{"b64": encoded_string}]}}
data = json.dumps(params)
rep = requests.post("http://10.18.131.58:8501/v1/models/yolo:predict", data=data)
return rep.textCPU & GPU Performance Comparison
Using GPU acceleration, a single request is processed in about 50 ms, whereas the CPU‑only setup takes roughly 280 ms, yielding a 5–6× speed improvement.
References
https://www.tensorflow.org/tfx/guide/serving
https://www.tensorflow.org/guide/saved_model
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
360 Quality & Efficiency
360 Quality & Efficiency focuses on seamlessly integrating quality and efficiency in R&D, sharing 360’s internal best practices with industry peers to foster collaboration among Chinese enterprises and drive greater efficiency value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
