Tag

MLOps

0 views collected around this technical thread.

DeWu Technology
DeWu Technology
May 9, 2025 · Artificial Intelligence

Growth Story of a Technical Lead: Building a One‑Stop Large‑Model Training and Inference Platform at Dewu

Meng, a former Tencent and Alibaba engineer, led Dewu’s one‑stop large‑model training and inference platform, cutting integration costs, creating a shared GPU pool and CI/CD pipeline, building a Milvus vector‑database, and driving self‑directed learning that boosted business value, user experience, and set a roadmap for future RAG and cloud‑native optimizations.

AI PlatformMLOpscareer development
0 likes · 18 min read
Growth Story of a Technical Lead: Building a One‑Stop Large‑Model Training and Inference Platform at Dewu
Bitu Technology
Bitu Technology
Mar 21, 2025 · Backend Development

Optimizing Redis Latency for an Online Feature Store: A Batch Query Case Study

This article describes how Tubi improved the latency of its Redis‑backed online feature store for machine‑learning inference by analyzing query patterns, measuring client‑side bottlenecks, and applying optimizations such as binary Avro encoding, MGET usage, virtual partitioning, and parallel deserialization to meet a sub‑10 ms SLA.

Batch QueryFeature StoreMLOps
0 likes · 9 min read
Optimizing Redis Latency for an Online Feature Store: A Batch Query Case Study
DataFunSummit
DataFunSummit
Jan 11, 2025 · Artificial Intelligence

Generative AI Applications, MLOps, and LLMOps: A Comprehensive Overview

This article presents a detailed overview of generative AI lifecycle management, covering practical use cases such as email summarization, the roles of providers, fine‑tuners and consumers, MLOps/LLMOps processes, retrieval‑augmented generation, efficient fine‑tuning methods like PEFT, and Amazon Bedrock services for model deployment and monitoring.

Amazon BedrockLLMOpsMLOps
0 likes · 14 min read
Generative AI Applications, MLOps, and LLMOps: A Comprehensive Overview
DeWu Technology
DeWu Technology
Dec 11, 2024 · Artificial Intelligence

MLOps Practices for Improving Order Fulfillment Timeliness

The supply‑chain team leveraged core MLOps practices—versioning, testing, automated reproducible pipelines, deployment monitoring, and documentation—to eliminate data leakage, ensure online consistency, and accelerate model upgrades, using traffic‑replay, FAAS‑based decoupling, and approval workflows, ultimately cutting order‑fulfillment times, reducing costs, and enabling business teams to adopt reliable AI models at scale.

AutomationData VersioningMLOps
0 likes · 18 min read
MLOps Practices for Improving Order Fulfillment Timeliness
Baidu Geek Talk
Baidu Geek Talk
Oct 30, 2024 · Cloud Computing

Baidu Cloud Infrastructure for AI-Native Era

Baidu Intelligent Cloud outlines how its evolving, high-performance infrastructure—featuring rapid 3-minute instance provisioning, over 200 GB bandwidth, elastic computing, specialized storage, and AI-driven MLOps tools—enables AI-native model training and deployment across booming sectors such as automotive and finance, supporting the industry’s shift to AI-centric cloud services.

AI infrastructureCloud ComputingDistributed Systems
0 likes · 9 min read
Baidu Cloud Infrastructure for AI-Native Era
DataFunTalk
DataFunTalk
Jun 11, 2024 · Artificial Intelligence

Intelligent Risk Control: Concepts, Challenges, and Integrated Operational Architecture for Banking

This article explores the concept of intelligent risk control in banking, detailing its AI‑driven architecture, current challenges such as external data costs and model‑deployment friction, and proposes an integrated operational framework that leverages big data, knowledge graphs, and MLOps to enhance risk detection and decision‑making.

Artificial IntelligenceBig DataMLOps
0 likes · 14 min read
Intelligent Risk Control: Concepts, Challenges, and Integrated Operational Architecture for Banking
DataFunTalk
DataFunTalk
Jun 4, 2024 · Artificial Intelligence

Building an Integrated Intelligent Risk Control System for Banking

The article explores the concept, challenges, and future directions of intelligent banking risk control, emphasizing data integration, AI-driven modeling, feature engineering, MLOps, knowledge graphs, and large‑model applications to create a unified, automated risk management platform.

AIBig DataMLOps
0 likes · 10 min read
Building an Integrated Intelligent Risk Control System for Banking
DevOps
DevOps
Mar 3, 2024 · Operations

How Generative AI is Transforming DevOps: Benefits, Challenges, and Best Practices

Since 2022, generative AI has become a pervasive trend, and this article explores its integration into DevOps, outlining the technology’s advantages, limitations, emerging trends, and best practices while highlighting how AI‑driven automation reshapes software engineering workflows.

AI ethicsArtificial IntelligenceAutomation
0 likes · 10 min read
How Generative AI is Transforming DevOps: Benefits, Challenges, and Best Practices
Didi Tech
Didi Tech
Jan 25, 2024 · Artificial Intelligence

Ray-native XGBoost Training Platform: Architecture, Performance, and Technical Challenges

Didi’s new Ray‑native XGBoost training platform replaces the fault‑prone Spark solution with a fully Pythonic, fault‑tolerant architecture that leverages Ray’s autoscaling and gang‑scheduling, delivering 2–6× speedups, reduced failure rates, efficient sparse‑vector handling, scalable hyper‑parameter search, and improved resource utilization for large‑scale machine‑learning workloads.

Hyperparameter OptimizationMLOpsRay
0 likes · 20 min read
Ray-native XGBoost Training Platform: Architecture, Performance, and Technical Challenges
DataFunSummit
DataFunSummit
Oct 7, 2023 · Artificial Intelligence

MLOps Implementation in Network Intelligence: Jiutian Platform Overview

This article presents the Jiutian Network Intelligence platform’s MLOps implementation at China Mobile, detailing its AI engineering workflow, platform functional and technical architecture, technology selections, model deployment, monitoring, and operational challenges, and shares insights on scaling AI services across 31 provinces.

AI EngineeringMLOpsNetwork Intelligence
0 likes · 20 min read
MLOps Implementation in Network Intelligence: Jiutian Platform Overview
Cloud Native Technology Community
Cloud Native Technology Community
Jul 27, 2023 · Artificial Intelligence

Kubeflow Overview: CNCF‑Incubated MLOps Platform on Kubernetes

Kubeflow is an open‑source, CNCF‑incubated project that provides a Kubernetes‑native MLOps platform integrating notebooks, training operators, AutoML (Katib), pipelines, and model serving (KServe) to streamline the development, deployment, and scaling of machine learning models across diverse frameworks.

AICNCFKubeflow
0 likes · 7 min read
Kubeflow Overview: CNCF‑Incubated MLOps Platform on Kubernetes
Cloud Native Technology Community
Cloud Native Technology Community
Jun 28, 2023 · Artificial Intelligence

Building and Deploying Custom Large Language Models with Alauda Cloud‑Native MLOps

This article explains how enterprises can use the Alauda MLOps platform to quickly set up, fine‑tune, and deploy private large language models on cloud‑native infrastructure, covering notebook preparation, GPU allocation, model download, inference service creation, distributed training pipelines, and Docker image building.

AIFine-tuningMLOps
0 likes · 9 min read
Building and Deploying Custom Large Language Models with Alauda Cloud‑Native MLOps
DataFunSummit
DataFunSummit
Jun 24, 2023 · Artificial Intelligence

From Model to Service: Alibaba Cloud Machine Learning PAI One‑Stop Model Development and Deployment Practice

This article presents an end‑to‑end overview of Alibaba Cloud’s Machine Learning PAI platform, detailing the three‑stage ML workflow, challenges in model development, the role of pre‑trained and open‑source models, PAI’s architecture, a hands‑on demo, and MLOps best practices for efficient model deployment.

AI PlatformAlibaba CloudMLOps
0 likes · 11 min read
From Model to Service: Alibaba Cloud Machine Learning PAI One‑Stop Model Development and Deployment Practice
DataFunSummit
DataFunSummit
Mar 30, 2023 · Artificial Intelligence

MindAlpha: A High‑Performance Distributed Machine Learning Platform for Advertising

The article introduces MindAlpha, a high‑performance distributed machine‑learning platform built for large‑scale, sparse ad‑tech workloads, detailing its architecture, MLOps pipeline, Spark integration, sync/async training strategies, CPU/GPU choices, model‑splitting techniques, and future directions such as model pruning and AutoML.

AIAd TechMLOps
0 likes · 10 min read
MindAlpha: A High‑Performance Distributed Machine Learning Platform for Advertising
Tencent Advertising Technology
Tencent Advertising Technology
Mar 30, 2023 · Artificial Intelligence

Tencent's Taiji Machine Learning Platform: End-to-End MLOps for Advertising

Tencent’s Taiji machine learning platform, a cloud‑native, distributed parameter‑server system, provides end‑to‑end MLOps for advertising by integrating data ingestion, feature engineering, model training, evaluation, deployment, and monitoring, supporting massive models up to billions of parameters while improving efficiency, scalability, and resource management.

MLOpsModel Deploymentadvertising
0 likes · 18 min read
Tencent's Taiji Machine Learning Platform: End-to-End MLOps for Advertising
DataFunTalk
DataFunTalk
Mar 25, 2023 · Artificial Intelligence

ZhongAn Financial Real‑Time Feature Platform: MLOps Practices, Architecture and Anti‑Fraud Applications

This article presents ZhongAn Financial’s end‑to‑end MLOps workflow and real‑time feature platform architecture, detailing team roles, data pipelines, Flink‑based processing, TableStore storage, anti‑fraud feature design, and answers to common implementation questions, offering a comprehensive guide for building scalable, low‑latency ML services in finance.

Data EngineeringMLOpsReal-time Features
0 likes · 25 min read
ZhongAn Financial Real‑Time Feature Platform: MLOps Practices, Architecture and Anti‑Fraud Applications
AntTech
AntTech
Mar 13, 2023 · Artificial Intelligence

Thoughts on the Next‑Generation AI Infrastructure: Green and Shared Model‑as‑a‑Service

In this conference talk, He Zhengyu of Ant Group outlines the challenges of large‑model AI, proposes a green, shared, model‑centric infrastructure built on foundation models, cloud‑native MLOps, and Model‑as‑a‑Service (MaaS) to lower cost and accelerate AI adoption across industries.

AI infrastructureFoundation ModelsGreen computing
0 likes · 14 min read
Thoughts on the Next‑Generation AI Infrastructure: Green and Shared Model‑as‑a‑Service
DataFunSummit
DataFunSummit
Feb 21, 2023 · Artificial Intelligence

Practices and Reflections on Building an AI Platform at Zhongyuan Bank

This article details Zhongyuan Bank's AI platform construction, covering its objectives, MLOps-driven design, core modules such as data ingestion, processing, model development, training, evaluation, deployment, monitoring, as well as resource orchestration with Kubernetes and Docker, and the accompanying ModelOps governance framework.

AICloud ComputingData Governance
0 likes · 22 min read
Practices and Reflections on Building an AI Platform at Zhongyuan Bank
Efficient Ops
Efficient Ops
Jan 16, 2023 · Artificial Intelligence

How MLOps Is Transforming AI Production in China: Trends, Tools, and Standards

This report examines how MLOps is accelerating AI production in China, highlighting industry adoption across sectors, the booming tool ecosystem, the rise of feature platforms, enhanced observability, performance needs for large models, AI asset management, and the emerging national standards and evaluation results.

AI EngineeringAI standardsFeatureOps
0 likes · 8 min read
How MLOps Is Transforming AI Production in China: Trends, Tools, and Standards