Tagged articles
10 articles
Page 1 of 1
iQIYI Technical Product Team
iQIYI Technical Product Team
Sep 24, 2021 · Artificial Intelligence

Memory Leak Diagnosis and Fixes for TensorFlow Serving in iQIYI’s Deep Learning Platform

The iQIYI deep‑learning platform identified two TensorFlow Serving memory‑leak problems—a string‑accumulating executor map caused by unordered input maps and an uncontrolled gRPC thread surge under heavy load—submitted upstream patches that sort inputs and cap thread counts, eliminating OOM crashes and stabilizing production.

Performance OptimizationTensorFlow ServinggRPC
0 likes · 10 min read
Memory Leak Diagnosis and Fixes for TensorFlow Serving in iQIYI’s Deep Learning Platform
iQIYI Technical Product Team
iQIYI Technical Product Team
Nov 27, 2020 · Artificial Intelligence

Optimizing TensorFlow Serving Model Hot‑Update to Eliminate Latency Spikes in CTR Recommendation Systems

By adding model warm‑up files, separating load/unload threads, switching to the Jemalloc allocator, and isolating TensorFlow’s parameter memory from RPC request buffers, iQIYI’s engineers reduced TensorFlow Serving hot‑update latency spikes in high‑throughput CTR recommendation services from over 120 ms to about 2 ms, eliminating jitter.

Model Hot UpdateTensorFlow ServingWarmup
0 likes · 11 min read
Optimizing TensorFlow Serving Model Hot‑Update to Eliminate Latency Spikes in CTR Recommendation Systems
Jike Tech Team
Jike Tech Team
Jul 15, 2020 · Artificial Intelligence

How Embedding-Based Recall Boosted Interaction by 33% in a Live Feed

This article details how Jike's recommendation team upgraded from Spark to TensorFlow, introduced a twin‑tower embedding model for recall, deployed it with TensorFlow Serving and Elasticsearch, and achieved a 33.75% lift in user interaction on the dynamic square.

Deep LearningElasticsearchEmbedding
0 likes · 9 min read
How Embedding-Based Recall Boosted Interaction by 33% in a Live Feed
360 Quality & Efficiency
360 Quality & Efficiency
Dec 6, 2019 · Artificial Intelligence

Deploying YOLO V3 with TensorFlow Serving: Environment Setup, Model Conversion, Service Deployment, and Performance Comparison

This article explains how to prepare the Docker environment, install TensorFlow Serving (CPU and GPU versions), convert a YOLO V3 checkpoint to SavedModel, deploy the model as a service, warm‑up and manage versions, invoke it via gRPC and HTTP, and compare CPU versus GPU inference performance.

AIDockerGPU
0 likes · 9 min read
Deploying YOLO V3 with TensorFlow Serving: Environment Setup, Model Conversion, Service Deployment, and Performance Comparison
Meituan Technology Team
Meituan Technology Team
Feb 21, 2019 · Artificial Intelligence

Deep Learning-Based ETA Estimation in Meituan's Delivery System

Meituan’s delivery ETA system progressed from linear regression to DeepFM, enriching user, rider, merchant, and spatiotemporal features, employing an asymmetric loss and business‑rule integration to favor early arrivals, adding a tail‑adjustment term, and is engineered with Spark‑assembled TFRecords, multi‑GPU TensorFlow training, and remote‑served TensorFlow Java inference achieving sub‑5 ms TP99 latency.

ETALong TailTensorFlow Serving
0 likes · 15 min read
Deep Learning-Based ETA Estimation in Meituan's Delivery System
58 Tech
58 Tech
Nov 21, 2018 · Artificial Intelligence

Design and Implementation of the 58 Deep Learning Online Prediction Service

This article describes the architecture, components, and deployment strategies of the 58 deep learning online prediction service, covering TensorFlow‑Serving, custom model serving, traffic forwarding, load balancing, GPU configuration, resource monitoring, and the supporting web management platform.

GPUTensorFlow Servingload balancing
0 likes · 15 min read
Design and Implementation of the 58 Deep Learning Online Prediction Service
Meituan Technology Team
Meituan Technology Team
Oct 11, 2018 · Artificial Intelligence

Deploying and Optimizing TensorFlow Serving for High‑Performance CTR Prediction

Meituan’s user‑growth team built a Wide‑Deep CTR prediction model, trained offline with Spark‑generated TFRecords, and deployed it via TensorFlow Serving on YARN, then applied request‑side multithreading, offline one‑hot preprocessing, XLA JIT compilation, and dedicated loading threads to cut end‑to‑end latency from ~18 ms to ~6 ms and eliminate model‑switch spikes.

Distributed TrainingModel DeploymentTensorFlow Serving
0 likes · 15 min read
Deploying and Optimizing TensorFlow Serving for High‑Performance CTR Prediction
Architecture Digest
Architecture Digest
Jul 27, 2018 · Artificial Intelligence

Comprehensive Guide to Deploying Deep Learning Models in Production

This article provides a step‑by‑step tutorial on deploying trained deep‑learning models to production, covering client‑server architecture, load balancing with Nginx, using Gunicorn and Flask, cloud platform choices, autoscaling, CI/CD pipelines, and additional tools such as TensorFlow Serving and Docker.

APIDockerModel Deployment
0 likes · 11 min read
Comprehensive Guide to Deploying Deep Learning Models in Production