Tagged articles

ModelServing

4 articles · Page 1 of 1

Oct 16, 2024 · Artificial Intelligence

Optimizing TorchServe Inference Service Architecture for High‑Performance AI Deployment

This article details the engineering practice of optimizing TorchServe‑based AI inference services, covering background challenges, framework selection, GPU‑accelerated Torch‑TRT integration, CPU‑side preprocessing improvements, and deployment on Kubernetes to achieve higher throughput and lower resource consumption.

GPUOptimizationKubernetesModelServing

0 likes · 17 min read

Architect

Dec 21, 2023 · Artificial Intelligence

How Baidu Scales Content Understanding to Trillion‑Level Data: Architecture, Cost & Efficiency Strategies

Baidu processes trillions of web items by building a deep‑content‑understanding pipeline that tackles massive compute cost and latency through elastic resource pooling, Python‑based model‑service frameworks, multi‑stage scheduling, HTAP storage, and batch‑compute optimizations, enabling real‑time and offline AI services at web scale.

AIBatchProcessingCloudNative

0 likes · 18 min read

How Baidu Scales Content Understanding to Trillion‑Level Data: Architecture, Cost & Efficiency Strategies

Alimama Tech

Jan 11, 2023 · Artificial Intelligence

Risk Detection Model Service Framework and Acceleration for Alibaba Content Risk Control

Alibaba’s new RiskDetection service framework replaces the bulky Inference‑kgb engine with a Triton‑based, Python‑driven kernel that unifies multiple back‑ends, standardizes tensor APIs, and accelerates image, text, and video risk models via HighService and EAS, delivering real‑time content risk control, scalable caching/batching, and significant GPU speedups for Double‑11 promotions.

AIBackendIntegrationInferenceEngine

0 likes · 25 min read

Risk Detection Model Service Framework and Acceleration for Alibaba Content Risk Control

DataFunTalk

Aug 3, 2022 · Artificial Intelligence

Building a Complete Machine Learning Application with OpenMLDB and OneFlow: JD High‑Potential User Purchase Intent Prediction

This tutorial demonstrates how to use OpenMLDB together with OneFlow to build an end‑to‑end machine‑learning pipeline for predicting high‑potential JD users' purchase intent, covering environment setup, data loading, SQL table creation, offline feature extraction, DeepFM model training, model serving, online feature extraction, deployment, and real‑time inference.

DockerFeatureEngineeringModelServing

0 likes · 22 min read

Building a Complete Machine Learning Application with OpenMLDB and OneFlow: JD High‑Potential User Purchase Intent Prediction