Artificial Intelligence 11 min read

Deploying TensorFlow 2.x Models with TensorFlow Serving: Concepts, Setup, and Usage

This guide explains the core concepts of TensorFlow Serving, shows how to prepare Docker images, save TensorFlow 2.x models in various formats, configure version policies, warm‑up models, start the service, and invoke it via gRPC or HTTP with complete code examples.

360 Tech Engineering

Aug 17, 2020

Deploying TensorFlow 2.x Models with TensorFlow Serving: Concepts, Setup, and Usage

In this guide we continue the previous article on TensorFlow 1.x model deployment with TensorFlow Serving and describe how to deploy TensorFlow 2.x models.

Core concepts of TensorFlow Serving include:

Servables – abstracted model services typically exposed via HTTP REST and gRPC servers.

Sources – discover models from directories and create servable streams.

Loader – API for loading and unloading servables.

Aspired version – the set of versions that should be loaded, provided by sources and managed by the manager.

Manager – controls the full lifecycle of servables, applying a version policy.

VersionPolicy – default policies are Availability Preserving (always keep at least one version loaded) and Resource Preserving (load only one version to save resources).

ServableHandler – handles client requests for a specific servable.

The servable lifecycle proceeds as follows: a source creates a loader for a specific version, notifies the manager, the manager decides (based on the version policy) whether to load a new version or unload an old one, allocates resources, and returns a handle to clients.

TensorFlow 2.x model deployment

Prepare the TensorFlow Serving environment using Docker images. Pull the CPU image: docker pull tensorflow/serving:latest Pull the GPU image (requires nvidia‑docker): docker pull tensorflow/serving:latest-devel-gpu Save models in one of the supported formats:

Checkpoint format: model.save_weights("./xxx.ckpt", save_format="tf") H5 format:

model.save("./xxx.h5")
model.save_weights("./xxx.h5", save_format="h5")

SavedModel format (recommended for TF Serving):

model.save("./xxx", save_format="tf")
tf.saved_model.save(obj, "./xxx")

Inspect the SavedModel structure (variables, assets, .pb) – see the diagram in the original article.

Warm‑up model

Because TensorFlow lazily loads components, the first inference can be slow. Generate a TFRecord warm‑up file and place it in assets.extra of the model version:

# coding=utf-8
import tensorflow as tf
from tensorflow_serving.apis import model_pb2, predict_pb2, prediction_log_pb2

def main():
    with tf.io.TFRecordWriter("tf_serving_warmup_requests") as writer:
        request = predict_pb2.PredictRequest(
            model_spec=model_pb2.ModelSpec(name="demo", signature_name='serving_default'),
            inputs={"x": tf.make_tensor_proto(["warm"]), "y": tf.make_tensor_proto(["up"])})
        log = prediction_log_pb2.PredictionLog(predict_log=prediction_log_pb2.PredictLog(request=request))
        writer.write(log.SerializeToString())

if __name__ == "__main__":
    main()

Copy the generated tf_serving_warmup_requests file into the model’s assets.extra directory.

Model maintenance

Configure version policies via a model.config file. Example for loading specific versions 1 and 2:

model_config_list {
  config {
    name: "demo"
    base_path: "/models/demo"
    model_platform: "tensorflow"
    model_version_policy {
      specific {
        versions: 1
        versions: 2
      }
    }
  }
}

Start the container with the config file:

docker run -p 8500:8500 -p 8501:8501 \
  --mount "type=bind,source=/home/test/ybq/model/demo,target=/models/demo" \
  -e MODEL_NAME=demo \
  tensorflow/serving:latest \
  --model_config_file=/models/demo/model.config \
  --model_config_file_poll_wait_seconds=60

For multiple models, list several config blocks in the same file and mount the parent directory.

Service start

CPU model deployment command:

docker run -p 8500:8500 -p 8501:8501 \
  --mount "type=bind,source=/home/test/ybq/model/demo,target=/models/demo" \
  -e MODEL_NAME=demo tensorflow/serving:latest

GPU model deployment command (requires --runtime nvidia):

docker run -p 8500:8500 -p 8501:8501 \
  --runtime nvidia \
  --mount "type=bind,source=/home/test/ybq/model/demo,target=/models/demo" \
  -e MODEL_NAME=demo tensorflow/serving:latest-gpu

Service call

Two interfaces are available: gRPC (default port 8500) and HTTP (default port 8501). Example gRPC client:

# coding=utf-8
import grpc, json, tensorflow as tf
from tensorflow_serving.apis import predict_pb2, prediction_service_pb2_grpc

def test_grpc():
    channel = grpc.insecure_channel('127.0.0.1:8500')
    stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
    request = predict_pb2.PredictRequest()
    request.model_spec.name = "demo"
    request.model_spec.signature_name = "test_concat"
    request.inputs['a'].CopyFrom(tf.make_tensor_proto("xxx"))
    result = stub.Predict(request, 10.0)
    return result.outputs

Example HTTP client:

# coding=utf-8
import requests, json

def test_http():
    params = json.dumps({
        "signature_name": "test_concat",
        "inputs": {"a": "xxx"}
    })
    data = json.dumps(params)
    resp = requests.post('http://127.0.0.1:8501/v1/models/demo/version1/:predict', data=data)
    return resp.text

Reference: TensorFlow Serving Documentation

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Docker Model Deployment grpc HTTP TensorFlow Serving Warmup Version Policy

Written by

360 Tech Engineering

Official tech channel of 360, building the most professional technology aggregation platform for the brand.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.