Tagged articles

TensorFlow Serving

10 articles · Page 1 of 1

Sep 24, 2021 · Artificial Intelligence

Memory Leak Diagnosis and Fixes for TensorFlow Serving in iQIYI’s Deep Learning Platform

The iQIYI deep‑learning platform identified two TensorFlow Serving memory‑leak problems—a string‑accumulating executor map caused by unordered input maps and an uncontrolled gRPC thread surge under heavy load—submitted upstream patches that sort inputs and cap thread counts, eliminating OOM crashes and stabilizing production.

Performance OptimizationTensorFlow Servinggrpc

0 likes · 10 min read

Memory Leak Diagnosis and Fixes for TensorFlow Serving in iQIYI’s Deep Learning Platform

iQIYI Technical Product Team

Nov 27, 2020 · Artificial Intelligence

Optimizing TensorFlow Serving Model Hot‑Update to Eliminate Latency Spikes in CTR Recommendation Systems

By adding model warm‑up files, separating load/unload threads, switching to the Jemalloc allocator, and isolating TensorFlow’s parameter memory from RPC request buffers, iQIYI’s engineers reduced TensorFlow Serving hot‑update latency spikes in high‑throughput CTR recommendation services from over 120 ms to about 2 ms, eliminating jitter.

Model Hot UpdateTensorFlow ServingWarmup

0 likes · 11 min read

Optimizing TensorFlow Serving Model Hot‑Update to Eliminate Latency Spikes in CTR Recommendation Systems

360 Tech Engineering

Aug 17, 2020 · Artificial Intelligence

Deploying TensorFlow 2.x Models with TensorFlow Serving: Concepts, Setup, and Usage

This guide explains the core concepts of TensorFlow Serving, shows how to prepare Docker images, save TensorFlow 2.x models in various formats, configure version policies, warm‑up models, start the service, and invoke it via gRPC or HTTP with complete code examples.

DockerHTTPModel Deployment

0 likes · 11 min read

Deploying TensorFlow 2.x Models with TensorFlow Serving: Concepts, Setup, and Usage

360 Quality & Efficiency

Aug 14, 2020 · Artificial Intelligence

Deploying TensorFlow 2.x Models with TensorFlow Serving: Architecture, Setup, and Usage

This article explains the core concepts of TensorFlow Serving, shows how to prepare the environment with Docker, convert TensorFlow 2.x models to the SavedModel format, configure version policies, warm‑up the service, and invoke predictions via gRPC or HTTP interfaces.

DockerHTTPModel Deployment

0 likes · 11 min read

Deploying TensorFlow 2.x Models with TensorFlow Serving: Architecture, Setup, and Usage

Jike Tech Team

Jul 15, 2020 · Artificial Intelligence

How Embedding-Based Recall Boosted Interaction by 33% in a Live Feed

This article details how Jike's recommendation team upgraded from Spark to TensorFlow, introduced a twin‑tower embedding model for recall, deployed it with TensorFlow Serving and Elasticsearch, and achieved a 33.75% lift in user interaction on the dynamic square.

ElasticsearchEmbeddingTensorFlow Serving

0 likes · 9 min read

How Embedding-Based Recall Boosted Interaction by 33% in a Live Feed

360 Quality & Efficiency

Dec 6, 2019 · Artificial Intelligence

Deploying YOLO V3 with TensorFlow Serving: Environment Setup, Model Conversion, Service Deployment, and Performance Comparison

This article explains how to prepare the Docker environment, install TensorFlow Serving (CPU and GPU versions), convert a YOLO V3 checkpoint to SavedModel, deploy the model as a service, warm‑up and manage versions, invoke it via gRPC and HTTP, and compare CPU versus GPU inference performance.

AIDockerGPU

0 likes · 9 min read

Deploying YOLO V3 with TensorFlow Serving: Environment Setup, Model Conversion, Service Deployment, and Performance Comparison

Meituan Technology Team

Feb 21, 2019 · Artificial Intelligence

Deep Learning-Based ETA Estimation in Meituan's Delivery System

Meituan’s delivery ETA system progressed from linear regression to DeepFM, enriching user, rider, merchant, and spatiotemporal features, employing an asymmetric loss and business‑rule integration to favor early arrivals, adding a tail‑adjustment term, and is engineered with Spark‑assembled TFRecords, multi‑GPU TensorFlow training, and remote‑served TensorFlow Java inference achieving sub‑5 ms TP99 latency.

ETALong TailTensorFlow Serving

0 likes · 15 min read

Deep Learning-Based ETA Estimation in Meituan's Delivery System

58 Tech

Nov 21, 2018 · Artificial Intelligence

Design and Implementation of the 58 Deep Learning Online Prediction Service

This article describes the architecture, components, and deployment strategies of the 58 deep learning online prediction service, covering TensorFlow‑Serving, custom model serving, traffic forwarding, load balancing, GPU configuration, resource monitoring, and the supporting web management platform.

GPUTensorFlow Servingload balancing

0 likes · 15 min read

Design and Implementation of the 58 Deep Learning Online Prediction Service

Meituan Technology Team

Oct 11, 2018 · Artificial Intelligence

Deploying and Optimizing TensorFlow Serving for High‑Performance CTR Prediction

Meituan’s user‑growth team built a Wide‑Deep CTR prediction model, trained offline with Spark‑generated TFRecords, and deployed it via TensorFlow Serving on YARN, then applied request‑side multithreading, offline one‑hot preprocessing, XLA JIT compilation, and dedicated loading threads to cut end‑to‑end latency from ~18 ms to ~6 ms and eliminate model‑switch spikes.

Model DeploymentTensorFlow Servingdistributed training

0 likes · 15 min read

Deploying and Optimizing TensorFlow Serving for High‑Performance CTR Prediction

Architecture Digest

Jul 27, 2018 · Artificial Intelligence

Comprehensive Guide to Deploying Deep Learning Models in Production

This article provides a step‑by‑step tutorial on deploying trained deep‑learning models to production, covering client‑server architecture, load balancing with Nginx, using Gunicorn and Flask, cloud platform choices, autoscaling, CI/CD pipelines, and additional tools such as TensorFlow Serving and Docker.

APICloud ComputingDocker

0 likes · 11 min read

Comprehensive Guide to Deploying Deep Learning Models in Production