Tag

TensorFlow Serving

0 views collected around this technical thread.

iQIYI Technical Product Team
iQIYI Technical Product Team
Sep 24, 2021 · Artificial Intelligence

Memory Leak Diagnosis and Fixes for TensorFlow Serving in iQIYI’s Deep Learning Platform

The iQIYI deep‑learning platform identified two TensorFlow Serving memory‑leak problems—a string‑accumulating executor map caused by unordered input maps and an uncontrolled gRPC thread surge under heavy load—submitted upstream patches that sort inputs and cap thread counts, eliminating OOM crashes and stabilizing production.

AI infrastructureMemory LeakTensorFlow Serving
0 likes · 10 min read
Memory Leak Diagnosis and Fixes for TensorFlow Serving in iQIYI’s Deep Learning Platform
iQIYI Technical Product Team
iQIYI Technical Product Team
Nov 27, 2020 · Artificial Intelligence

Optimizing TensorFlow Serving Model Hot‑Update to Eliminate Latency Spikes in CTR Recommendation Systems

By adding model warm‑up files, separating load/unload threads, switching to the Jemalloc allocator, and isolating TensorFlow’s parameter memory from RPC request buffers, iQIYI’s engineers reduced TensorFlow Serving hot‑update latency spikes in high‑throughput CTR recommendation services from over 120 ms to about 2 ms, eliminating jitter.

AI inferenceModel Hot UpdateTensorFlow Serving
0 likes · 11 min read
Optimizing TensorFlow Serving Model Hot‑Update to Eliminate Latency Spikes in CTR Recommendation Systems
360 Tech Engineering
360 Tech Engineering
Aug 17, 2020 · Artificial Intelligence

Deploying TensorFlow 2.x Models with TensorFlow Serving: Concepts, Setup, and Usage

This guide explains the core concepts of TensorFlow Serving, shows how to prepare Docker images, save TensorFlow 2.x models in various formats, configure version policies, warm‑up models, start the service, and invoke it via gRPC or HTTP with complete code examples.

DockerHTTPModel Deployment
0 likes · 11 min read
Deploying TensorFlow 2.x Models with TensorFlow Serving: Concepts, Setup, and Usage
360 Quality & Efficiency
360 Quality & Efficiency
Aug 14, 2020 · Artificial Intelligence

Deploying TensorFlow 2.x Models with TensorFlow Serving: Architecture, Setup, and Usage

This article explains the core concepts of TensorFlow Serving, shows how to prepare the environment with Docker, convert TensorFlow 2.x models to the SavedModel format, configure version policies, warm‑up the service, and invoke predictions via gRPC or HTTP interfaces.

DockerHTTPModel Deployment
0 likes · 11 min read
Deploying TensorFlow 2.x Models with TensorFlow Serving: Architecture, Setup, and Usage
360 Quality & Efficiency
360 Quality & Efficiency
Dec 6, 2019 · Artificial Intelligence

Deploying YOLO V3 with TensorFlow Serving: Environment Setup, Model Conversion, Service Deployment, and Performance Comparison

This article explains how to prepare the Docker environment, install TensorFlow Serving (CPU and GPU versions), convert a YOLO V3 checkpoint to SavedModel, deploy the model as a service, warm‑up and manage versions, invoke it via gRPC and HTTP, and compare CPU versus GPU inference performance.

AIDockerGPU
0 likes · 9 min read
Deploying YOLO V3 with TensorFlow Serving: Environment Setup, Model Conversion, Service Deployment, and Performance Comparison
58 Tech
58 Tech
Nov 21, 2018 · Artificial Intelligence

Design and Implementation of the 58 Deep Learning Online Prediction Service

This article describes the architecture, components, and deployment strategies of the 58 deep learning online prediction service, covering TensorFlow‑Serving, custom model serving, traffic forwarding, load balancing, GPU configuration, resource monitoring, and the supporting web management platform.

GPUKubernetesLoad Balancing
0 likes · 15 min read
Design and Implementation of the 58 Deep Learning Online Prediction Service
Architecture Digest
Architecture Digest
Jul 27, 2018 · Artificial Intelligence

Comprehensive Guide to Deploying Deep Learning Models in Production

This article provides a step‑by‑step tutorial on deploying trained deep‑learning models to production, covering client‑server architecture, load balancing with Nginx, using Gunicorn and Flask, cloud platform choices, autoscaling, CI/CD pipelines, and additional tools such as TensorFlow Serving and Docker.

APICloud ComputingDocker
0 likes · 11 min read
Comprehensive Guide to Deploying Deep Learning Models in Production