Deploying TensorFlow 2.x Models with TensorFlow Serving: Architecture, Setup, and Usage
This article explains the core concepts of TensorFlow Serving, shows how to prepare the environment with Docker, convert TensorFlow 2.x models to the SavedModel format, configure version policies, warm‑up the service, and invoke predictions via gRPC or HTTP interfaces.
The article continues a previous guide on TensorFlow Serving by focusing on TensorFlow 2.x model deployment. It first introduces the architecture of TensorFlow Serving, describing key components such as servables, sources, loaders, aspired versions, managers, version policies, and servable handlers.
Core Concepts
Servables: abstracted model services composed of HTTP/gRPC servers and model management.
Sources: discover models from directories and create servable streams.
Loader: API for loading and unloading servables.
Aspired Version: set of servables that should be loaded.
Manager: controls the lifecycle of servables according to version policies.
VersionPolicy: Availability Preserving (load new version before unloading old) and Resource Preserving (avoid loading two versions simultaneously).
ServableHandler: handles client requests.
TensorFlow Serving Environment Preparation
Install TensorFlow Serving using Docker images:
docker pull tensorflow/serving:latestFor GPU support:
docker pull tensorflow/serving:latest-devel-gpuInstall NVIDIA Docker on Ubuntu:
sudo apt-get install -y nvidia-docker2Pull the GPU image:
docker pull tensorflow/serving:latest-devel-gpuModel Saving Formats (TensorFlow 2.x)
model.save_weights("./xxx.ckpt", save_format="tf") model.save("./xxx.h5")
model.save_weights("./xxx.h5", save_format="h5") model.save("./xxx", save_format="tf")
tf.saved_model.save(obj, "./xxx") model.to_json()TensorFlow Serving supports the SavedModel format; the directory contains .pb (graph), variables (weights), and assets (extra files).
Example Model Export
# coding=utf-8
import tensorflow as tf
class TestTFServing(tf.Module):
def __init__(self):
self.x = tf.Variable("hello", dtype=tf.string, trainable=True)
@tf.function(input_signature=[tf.TensorSpec(shape=[], dtype=tf.string)])
def concat_str(self, a):
self.x = self.x + a
return self.x
@tf.function(input_signature=[tf.TensorSpec(shape=[], dtype=tf.string)])
def cp_str(self, b):
self.x.assign(b)
return self.x
if __name__ == '__main__':
demo = TestTFServing()
tf.saved_model.save(demo, "model\\test\\1", signatures={"test_assign": demo.cp_str, "test_concat": demo.concat_str})Another example shows a custom Keras model saved in TensorFlow format:
# coding=utf-8
import tensorflow as tf
class DenseNet(tf.keras.Model):
def __init__(self):
super().__init__()
def build(self, input_shape):
self.dense1 = tf.keras.layers.Dense(15, activation='relu')
self.dense2 = tf.keras.layers.Dense(10, activation='relu')
self.dense3 = tf.keras.layers.Dense(1, activation='sigmoid')
super().build(input_shape)
def call(self, x):
x = self.dense1(x)
x = self.dense2(x)
x = self.dense3(x)
return x
if __name__ == '__main__':
model = DenseNet()
model.build(input_shape=(None, 15))
model.summary()
inputs = tf.random.uniform(shape=(10, 15))
model._set_inputs(inputs=inputs) # tf2.0 need this line
model.save(".\\model\\test\\2", save_format="tf")
tf.keras.models.save_model(model, ".\\model\\test\\3", save_format="tf")Service Startup
Run the CPU version container:
docker run -p 8500:8500 -p 8501:8501 --mount "type=bind,source=/home/test/ybq/model/demo,target=/models/demo" -e MODEL_NAME=demo tensorflow/serving:latestRun the GPU version container:
docker run -p 8500:8500 -p 8501:8501 --runtime nvidia --mount "type=bind,source=/home/test/ybq/model/demo,target=/models/demo" -e MODEL_NAME=demo tensorflow/serving:latest-gpuWarm‑up Model
Generate a TFRecord warm‑up file and place it in assets.extra of the model version:
# coding=utf-8
import tensorflow as tf
from tensorflow_serving.apis import model_pb2, predict_pb2, prediction_log_pb2
def main():
with tf.io.TFRecordWriter("tf_serving_warmup_requests") as writer:
request = predict_pb2.PredictRequest(
model_spec=model_pb2.ModelSpec(name="demo", signature_name='serving_default'),
inputs={"x": tf.make_tensor_proto(["warm"]), "y": tf.make_tensor_proto(["up"])})
log = prediction_log_pb2.PredictionLog(predict_log=prediction_log_pb2.PredictLog(request=request))
writer.write(log.SerializeToString())
if __name__ == "__main__":
main()Model Maintenance
Configure version policies via model.config :
model_config_list {
config {
name: "demo"
base_path: "/models/demo"
model_platform: "tensorflow"
model_version_policy {
specific { versions: 1 versions: 2 }
}
}
}For multiple models, list multiple config blocks in the same file.
Start serving with the config file:
docker run -p 8500:8500 -p 8501:8501 \
--mount "type=bind,source=/home/test/ybq/model/demo,target=/models/demo" \
-e MODEL_NAME=demo tensorflow/serving:latest \
--model_config_file=/models/demo/model.config \
--model_config_file_poll_wait_seconds=60Service Invocation
TensorFlow Serving offers gRPC (port 8500) and HTTP (port 8501) endpoints. Use saved_model_cli to inspect model signatures.
gRPC client example:
# coding=utf-8
import grpc, json, tensorflow as tf
from tensorflow_serving.apis import predict_pb2, prediction_service_pb2_grpc
def test_grpc():
channel = grpc.insecure_channel('127.0.0.1:8500')
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
request = predict_pb2.PredictRequest()
request.model_spec.name = "demo"
request.model_spec.signature_name = "test_concat"
request.inputs['a'].CopyFrom(tf.make_tensor_proto("xxx"))
result = stub.Predict(request, 10.0)
return result.outputsHTTP client example:
# coding=utf-8
import requests, json
def test_http():
params = json.dumps({"signature_name": "test_concat", "inputs": {"a": "xxx"}})
rep = requests.post('http://127.0.0.1:8501/v1/models/demo/version1/:predict', data=params)
return rep.textThe guide concludes with references to the official TensorFlow Serving documentation.
360 Quality & Efficiency
360 Quality & Efficiency focuses on seamlessly integrating quality and efficiency in R&D, sharing 360’s internal best practices with industry peers to foster collaboration among Chinese enterprises and drive greater efficiency value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.