Tagged articles
4 articles
Page 1 of 1
Zhuanzhuan Tech
Zhuanzhuan Tech
Oct 16, 2024 · Artificial Intelligence

Optimizing TorchServe Inference Service Architecture for High‑Performance AI Deployment

This article details the engineering practice of optimizing TorchServe‑based AI inference services, covering background challenges, framework selection, GPU‑accelerated Torch‑TRT integration, CPU‑side preprocessing improvements, and deployment on Kubernetes to achieve higher throughput and lower resource consumption.

GPUOptimizationKubernetesModelServing
0 likes · 17 min read
Optimizing TorchServe Inference Service Architecture for High‑Performance AI Deployment
Architect
Architect
Dec 21, 2023 · Artificial Intelligence

How Baidu Scales Content Understanding to Trillion‑Level Data: Architecture, Cost & Efficiency Strategies

Baidu processes trillions of web items by building a deep‑content‑understanding pipeline that tackles massive compute cost and latency through elastic resource pooling, Python‑based model‑service frameworks, multi‑stage scheduling, HTAP storage, and batch‑compute optimizations, enabling real‑time and offline AI services at web scale.

AIBatchProcessingCloudNative
0 likes · 18 min read
How Baidu Scales Content Understanding to Trillion‑Level Data: Architecture, Cost & Efficiency Strategies
Alimama Tech
Alimama Tech
Jan 11, 2023 · Artificial Intelligence

Risk Detection Model Service Framework and Acceleration for Alibaba Content Risk Control

Alibaba’s new RiskDetection service framework replaces the bulky Inference‑kgb engine with a Triton‑based, Python‑driven kernel that unifies multiple back‑ends, standardizes tensor APIs, and accelerates image, text, and video risk models via HighService and EAS, delivering real‑time content risk control, scalable caching/batching, and significant GPU speedups for Double‑11 promotions.

AIBackendIntegrationInferenceEngine
0 likes · 25 min read
Risk Detection Model Service Framework and Acceleration for Alibaba Content Risk Control
DataFunTalk
DataFunTalk
Aug 3, 2022 · Artificial Intelligence

Building a Complete Machine Learning Application with OpenMLDB and OneFlow: JD High‑Potential User Purchase Intent Prediction

This tutorial demonstrates how to use OpenMLDB together with OneFlow to build an end‑to‑end machine‑learning pipeline for predicting high‑potential JD users' purchase intent, covering environment setup, data loading, SQL table creation, offline feature extraction, DeepFM model training, model serving, online feature extraction, deployment, and real‑time inference.

DockerFeatureEngineeringModelServing
0 likes · 22 min read
Building a Complete Machine Learning Application with OpenMLDB and OneFlow: JD High‑Potential User Purchase Intent Prediction