Tag

ModelServing

0 views collected around this technical thread.

Zhuanzhuan Tech
Zhuanzhuan Tech
Oct 16, 2024 · Artificial Intelligence

Optimizing TorchServe Inference Service Architecture for High‑Performance AI Deployment

This article details the engineering practice of optimizing TorchServe‑based AI inference services, covering background challenges, framework selection, GPU‑accelerated Torch‑TRT integration, CPU‑side preprocessing improvements, and deployment on Kubernetes to achieve higher throughput and lower resource consumption.

GPUOptimizationKubernetesModelServing
0 likes · 17 min read
Optimizing TorchServe Inference Service Architecture for High‑Performance AI Deployment
Alimama Tech
Alimama Tech
Jan 11, 2023 · Artificial Intelligence

Risk Detection Model Service Framework and Acceleration for Alibaba Content Risk Control

Alibaba’s new RiskDetection service framework replaces the bulky Inference‑kgb engine with a Triton‑based, Python‑driven kernel that unifies multiple back‑ends, standardizes tensor APIs, and accelerates image, text, and video risk models via HighService and EAS, delivering real‑time content risk control, scalable caching/batching, and significant GPU speedups for Double‑11 promotions.

AIBackendIntegrationInferenceEngine
0 likes · 25 min read
Risk Detection Model Service Framework and Acceleration for Alibaba Content Risk Control
DataFunTalk
DataFunTalk
Aug 3, 2022 · Artificial Intelligence

Building a Complete Machine Learning Application with OpenMLDB and OneFlow: JD High‑Potential User Purchase Intent Prediction

This tutorial demonstrates how to use OpenMLDB together with OneFlow to build an end‑to‑end machine‑learning pipeline for predicting high‑potential JD users' purchase intent, covering environment setup, data loading, SQL table creation, offline feature extraction, DeepFM model training, model serving, online feature extraction, deployment, and real‑time inference.

DockerFeatureEngineeringModelServing
0 likes · 22 min read
Building a Complete Machine Learning Application with OpenMLDB and OneFlow: JD High‑Potential User Purchase Intent Prediction