Tutorial: Freezing a TensorFlow Model and Deploying It with Flask
This tutorial explains how to freeze a trained TensorFlow model, describes the required checkpoint files, provides Python code for converting the graph to a frozen .pb file, and discusses common performance and memory issues when serving the model with a Flask web server.
In this tutorial the author, a developer responsible for image processing at Beike, introduces the process of freezing a trained TensorFlow model and deploying it as a web service using the Python Flask framework.
The article begins by showing how to define placeholders for input data and labels, then explains why freezing is needed: the trained checkpoint contains all variables and gradients, but only the graph structure and weights are required for inference, so they are packaged into a single .pb file called a frozen graph.
The four files generated by TensorFlow during training are described: model-ckpt.meta: serialized MetaGraphDef containing the full graph definition. model-ckpt.data-0000-of-00001: stores the values of all variables (weights, biases, etc.). model-ckpt.index: an immutable table mapping tensor names to their serialized metadata. checkpoint: records checkpoint information.
Freezing code is provided, which imports the meta graph, restores the checkpoint, converts variables to constants, and writes the frozen graph to estate_model.pb:
import tensorflow as tf
from tensorflow.python.framework import graph_util
import os, sys
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('--meta', required=True, type=str, help='input model checkpoint meta data file (.meta)')
parser.add_argument('--prefix', required=True, type=str, help='input model data prefix')
FLAGS, unparsed = parser.parse_known_args()
output_node_names = "y_pred"
saver = tf.train.import_meta_graph(FLAGS.meta, clear_devices=True)
saver.restore(tf.Session(), FLAGS.prefix)
graph = tf.get_default_graph()
input_graph_def = graph.as_graph_def()
output_graph_def = graph_util.convert_variables_to_constants(
tf.Session(), input_graph_def, output_node_names.split(","))
output_graph = "estate_model.pb"
with tf.gfile.GFile(output_graph, "wb") as f:
f.write(output_graph_def.SerializeToString())After freezing, the article discusses performance problems encountered during deployment, such as long inference latency caused by loading all parameters into GPU memory on each request. The solution is to keep a persistent session and load the graph only once.
Memory leaks caused by repeated calls to tf.image.decode_image are also examined; profiling with time.time() and resource.getrusage reveals increasing memory usage. The fix is to move image decoding out of the graph construction phase.
Finally, the tutorial shows how to serve multiple models by creating separate graphs and sessions, ensuring each request explicitly uses the correct session/graph, and provides example functions run_graph1 and run_graph2 that load, preprocess, and run inference for different models.
Overall, the guide offers a complete workflow from training to freezing, performance tuning, and multi‑model deployment for TensorFlow models in a Flask‑based web service.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Beike Product & Technology
As Beike's official product and technology account, we are committed to building a platform for sharing Beike's product and technology insights, targeting internet/O2O developers and product professionals. We share high-quality original articles, tech salon events, and recruitment information weekly. Welcome to follow us.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
