Tagged articles

MLOps

77 articles · Page 1 of 1

Jun 6, 2026 · Artificial Intelligence

How to Turn Large‑Model Testing into Trustworthy Production: A Deep Dive

The article analyses why traditional deterministic testing fails for probabilistic large models, proposes a four‑dimensional D‑R‑A‑M testing framework, and shows how an MLOps pipeline can turn AI failures into measurable, traceable risk controls for large‑scale deployment.

AI testingMLOpsRisk Management

0 likes · 7 min read

How to Turn Large‑Model Testing into Trustworthy Production: A Deep Dive

Digital Planet

May 29, 2026 · Industry Insights

5 Essential Skills Data Professionals Must Master in 2026

In the AI‑driven era of 2026, data professionals need to focus on five high‑impact capabilities—data governance, practical large‑model usage, MLOps, data storytelling, and AI compliance—to stay indispensable, with each skill backed by industry reports, job growth data, and concrete learning pathways.

2026 TrendsAI SkillsAI compliance

0 likes · 13 min read

5 Essential Skills Data Professionals Must Master in 2026

Woodpecker Software Testing

Apr 15, 2026 · Artificial Intelligence

How AI Testing Tools Redefine Performance Optimization: A New Paradigm

Amid exploding large‑model deployments, AI teams struggle with slow test feedback, but AI‑native testing tools—through intelligent load modeling, inference‑layer root‑cause analysis, and self‑healing loops—demonstrate concrete latency reductions, resource savings, and faster issue remediation.

AI testingMLOpsObservability

0 likes · 6 min read

How AI Testing Tools Redefine Performance Optimization: A New Paradigm

Woodpecker Software Testing

Apr 10, 2026 · Artificial Intelligence

2026 Model Evaluation Reaches the Cost‑Benefit Threshold

In 2026, model evaluation has become the pivotal bottleneck in AI engineering, with exploding compute, data‑compliance, and tooling costs forcing a shift from labor‑intensive testing to quantifiable business value, and three levers—dynamic granularity, synthetic data loops, and evaluation‑as‑a‑service—offering a path to a cost‑benefit inflection point.

AI complianceDynamic GranularityEvaluation as a Service

0 likes · 7 min read

2026 Model Evaluation Reaches the Cost‑Benefit Threshold

Woodpecker Software Testing

Apr 3, 2026 · Artificial Intelligence

Why 80% of AI Projects Fail: Bridging Model Evaluation from Theory to Real‑World Impact

The article explains that most AI project failures stem from unrealistic evaluation rather than model intelligence, and outlines concrete practices—business‑aligned metrics, scenario sandboxes, human‑in‑the‑loop reviews, and auditable documentation—to make model evaluation truly actionable.

AI DeploymentAI ReliabilityMLOps

0 likes · 7 min read

Why 80% of AI Projects Fail: Bridging Model Evaluation from Theory to Real‑World Impact

Woodpecker Software Testing

Mar 18, 2026 · Artificial Intelligence

From Concept to Production: How AI-Driven Testing Becomes Real-World Practice

The article examines why most companies are still at the proof‑of‑concept stage for AI‑enabled testing, outlines three practical pillars—data, scenario selection, and closed‑loop feedback—and warns of common pseudo‑AI pitfalls through concrete industry case studies.

AI testingContinuous IntegrationData Governance

0 likes · 8 min read

From Concept to Production: How AI-Driven Testing Becomes Real-World Practice

MeowKitty Programming

Mar 12, 2026 · Industry Insights

Will AI Replace 85% of Basic Coding Jobs? A 20‑Year Veteran Reveals Who Will Be Cut

A veteran programmer analyses how AI tools like Copilot and Claude boost development speed, reshape job structures, risk skill erosion, and force a shift toward AI‑augmented roles, while offering concrete data and practical advice for staying relevant over the next five years.

AI programmingAI toolsIndustry Trends

0 likes · 10 min read

Will AI Replace 85% of Basic Coding Jobs? A 20‑Year Veteran Reveals Who Will Be Cut

AI Info Trend

Mar 10, 2026 · Industry Insights

Southeast Asia’s AI Surge: Opportunities, Challenges, and the 2026 Roadmap

McKinsey’s report reveals that AI is moving from pilot projects to large‑scale deployment across Southeast Asia, driven by youthful, mobile‑first populations and massive cloud investments, yet talent shortages, integration complexity, and unclear ROI remain the biggest hurdles for enterprises.

2026 StrategyAIAgentic AI

0 likes · 7 min read

Southeast Asia’s AI Surge: Opportunities, Challenges, and the 2026 Roadmap

Woodpecker Software Testing

Mar 6, 2026 · Artificial Intelligence

How RAG Testing Teams Can Successfully Transform in 2024

With RAG becoming the backbone of enterprise AI, traditional API‑UI testing misses critical semantic errors, leading to high hallucination rates; this article outlines why conventional methods fail and presents a three‑pillar transformation—skill rebuilding, process reengineering, and advanced tooling—backed by real‑world case studies.

AI testingHallucinationLLM

0 likes · 9 min read

How RAG Testing Teams Can Successfully Transform in 2024

Woodpecker Software Testing

Mar 1, 2026 · Artificial Intelligence

Automating Regression Tests for TensorRT Inference Services

The article outlines a comprehensive, repeatable regression testing framework for TensorRT inference pipelines, covering engine build validation, functional correctness against golden outputs, performance monitoring, common pitfalls, and CI/CD integration to ensure model updates remain both fast and reliable.

INT8 QuantizationMLOpsPerformance Regression

0 likes · 12 min read

Automating Regression Tests for TensorRT Inference Services

Woodpecker Software Testing

Feb 26, 2026 · Artificial Intelligence

How to Test Large Language Models: From Functional Correctness to Trustworthiness

The article examines why traditional deterministic testing fails for probabilistic LLMs and outlines a new testing paradigm that emphasizes safety, robustness, controllability, and explainability, illustrated with real‑world cases and a step‑by‑step MLOps workflow.

AI testingMLOpsPrompt Engineering

0 likes · 7 min read

How to Test Large Language Models: From Functional Correctness to Trustworthiness

Yunqi AI+

Feb 13, 2026 · Artificial Intelligence

Key Challenges When Enterprises Deploy AI-in-the-Loop

The article outlines a four‑layer framework—process, technology, risk, and culture—to help enterprises implement AI‑in‑the‑loop safely, ensuring AI assists decisions while humans retain final authority, with concrete governance, data, and organizational practices.

AI-in-the-loopHuman-AI CollaborationMLOps

0 likes · 7 min read

Key Challenges When Enterprises Deploy AI-in-the-Loop

21CTO

Oct 6, 2025 · Artificial Intelligence

How to Become an AI Engineer: Skills, Workflow, and Career Path

This guide explains what AI engineering entails, outlines the end‑to‑end workflow from problem definition and data preparation through model development, deployment, and monitoring, and highlights the essential programming, cloud, and MLOps skills, career tracks, emerging trends, and salary outlook for aspiring AI engineers.

AI EngineeringCloud ComputingMLOps

0 likes · 11 min read

How to Become an AI Engineer: Skills, Workflow, and Career Path

DevOps Cloud Academy

Sep 25, 2025 · Artificial Intelligence

How to Build Scalable MLOps Infrastructure for Enterprise AI Success

This article explains what MLOps is, why a robust MLOps framework is essential for businesses, outlines its core components, compares MLOps with AIOps, details the benefits of investing in MLOps, and provides a step‑by‑step guide to designing enterprise‑grade AI MLOps infrastructure.

AI InfrastructureGovernanceMLOps

0 likes · 17 min read

How to Build Scalable MLOps Infrastructure for Enterprise AI Success

DevOps Cloud Academy

Sep 21, 2025 · Artificial Intelligence

How to Deploy Machine Learning Models Efficiently: A Complete Guide

This guide explains what model deployment is, why it matters, the various deployment types, readiness criteria, best practices, common challenges, real‑world case studies, and the most popular tools and platforms for deploying machine learning models in production.

AICI/CDMLOps

0 likes · 20 min read

How to Deploy Machine Learning Models Efficiently: A Complete Guide

Alibaba Cloud Infrastructure

Jun 27, 2025 · Cloud Native

Why Argo Workflows Is the Leading Cloud‑Native Engine for AI & Data Pipelines

Argo Workflows, the top‑rated CNCF project, extends Kubernetes to orchestrate AI, ML, and data pipelines with a scalable, cloud‑native architecture, offering powerful scheduling, Python SDK support, and new plugins for Spark, Ray, and PyTorch.

AIArgo WorkflowsCloud Native

0 likes · 9 min read

Why Argo Workflows Is the Leading Cloud‑Native Engine for AI & Data Pipelines

Alibaba Cloud Observability

Jun 16, 2025 · Artificial Intelligence

Mastering AI Application Observability: From Metrics to Full‑Stack Tracing

This article explains why cost and performance are critical in the AI era, outlines the three main pain points of AI application development, and details a full‑stack observability solution—including architecture layers, key metrics like TTFT and TPOT, OpenTelemetry tracing, and practical tips for frameworks such as Dify—integrated into Alibaba Cloud CloudMonitor 2.0.

AI application monitoringAI observabilityLLM Performance

0 likes · 21 min read

Mastering AI Application Observability: From Metrics to Full‑Stack Tracing

Architect's Alchemy Furnace

Jun 4, 2025 · Artificial Intelligence

What Is an AI Engineer? Roles, Skills, and the Future of LLM‑Powered Systems

This article examines the evolving role of the AI engineer, contrasting it with AI researchers, ML engineers, and software engineers, outlines essential skills such as prompt engineering, MLOps, and data integration, and predicts how AI engineering will become a pivotal, high‑demand discipline in the coming years.

AI EngineeringAI SystemsAgentic RAG

0 likes · 17 min read

What Is an AI Engineer? Roles, Skills, and the Future of LLM‑Powered Systems

Software Engineering 3.0 Era

May 27, 2025 · Artificial Intelligence

Claude 4.0’s Unexpected Code Flood: Intentional Strategy or Model Quirk?

The article examines why Claude 4.0 suddenly generates large amounts of code, evaluates the strategic value of training vertical AI models, forecasts visual large‑model adoption in automated testing, and proposes a phased AI‑engineering capability roadmap for teams of different sizes.

AI EngineeringClaude 4.0MLOps

0 likes · 8 min read

Claude 4.0’s Unexpected Code Flood: Intentional Strategy or Model Quirk?

Alibaba Cloud Developer

May 14, 2025 · Artificial Intelligence

Deploy Alibaba’s Qwen3 LLM in 10 Minutes with Bailei Platform

Learn how to quickly set up Alibaba Cloud’s Bailei platform to call the open-source Qwen3 large language model, explore its cost‑effective performance, dual‑mode reasoning, multilingual support, and enhanced agent capabilities, and follow step‑by‑step instructions for API key configuration, Cherry Studio integration, and tool‑calling setup.

AI DeploymentAlibaba CloudMLOps

0 likes · 6 min read

Deploy Alibaba’s Qwen3 LLM in 10 Minutes with Bailei Platform

DeWu Technology

May 9, 2025 · Artificial Intelligence

Growth Story of a Technical Lead: Building a One‑Stop Large‑Model Training and Inference Platform at Dewu

Meng, a former Tencent and Alibaba engineer, led Dewu’s one‑stop large‑model training and inference platform, cutting integration costs, creating a shared GPU pool and CI/CD pipeline, building a Milvus vector‑database, and driving self‑directed learning that boosted business value, user experience, and set a roadmap for future RAG and cloud‑native optimizations.

AI platformMLOpscareer development

0 likes · 18 min read

Growth Story of a Technical Lead: Building a One‑Stop Large‑Model Training and Inference Platform at Dewu

Ops Development & AI Practice

Apr 19, 2025 · Industry Insights

Is AI Splitting into Two Worlds? Building Models vs Building Apps

The article analyzes the emerging divide in the AI ecosystem where a few giants focus on resource‑intensive large‑model research and training while most companies shift to leveraging existing models for application development, outlining the implications, challenges, and strategic advice for developers and enterprises.

AIApplication DevelopmentIndustry Analysis

0 likes · 10 min read

Is AI Splitting into Two Worlds? Building Models vs Building Apps

Bitu Technology

Mar 21, 2025 · Backend Development

Optimizing Redis Latency for an Online Feature Store: A Batch Query Case Study

This article describes how Tubi improved the latency of its Redis‑backed online feature store for machine‑learning inference by analyzing query patterns, measuring client‑side bottlenecks, and applying optimizations such as binary Avro encoding, MGET usage, virtual partitioning, and parallel deserialization to meet a sub‑10 ms SLA.

Feature StoreLatencyMLOps

0 likes · 9 min read

Optimizing Redis Latency for an Online Feature Store: A Batch Query Case Study

DataFunSummit

Jan 19, 2025 · Artificial Intelligence

Understanding MLOps and LMOps: Evolution, Engineering Practices, and Future Trends for Large Models

This article reviews the development of MLOps, introduces the emerging LMOps framework for large‑model engineering, outlines key architectural components, discusses current challenges and industry trends, and presents future directions and standardization efforts in AI operations.

AI EngineeringAI OpsLMOps

0 likes · 18 min read

Understanding MLOps and LMOps: Evolution, Engineering Practices, and Future Trends for Large Models

DataFunSummit

Jan 11, 2025 · Artificial Intelligence

Generative AI Applications, MLOps, and LLMOps: A Comprehensive Overview

This article presents a detailed overview of generative AI lifecycle management, covering practical use cases such as email summarization, the roles of providers, fine‑tuners and consumers, MLOps/LLMOps processes, retrieval‑augmented generation, efficient fine‑tuning methods like PEFT, and Amazon Bedrock services for model deployment and monitoring.

Amazon BedrockLLMOpsMLOps

0 likes · 14 min read

Generative AI Applications, MLOps, and LLMOps: A Comprehensive Overview

DeWu Technology

Dec 11, 2024 · Artificial Intelligence

MLOps Practices for Improving Order Fulfillment Timeliness

The supply‑chain team leveraged core MLOps practices—versioning, testing, automated reproducible pipelines, deployment monitoring, and documentation—to eliminate data leakage, ensure online consistency, and accelerate model upgrades, using traffic‑replay, FAAS‑based decoupling, and approval workflows, ultimately cutting order‑fulfillment times, reducing costs, and enabling business teams to adopt reliable AI models at scale.

AutomationMLOpsModel Deployment

0 likes · 18 min read

MLOps Practices for Improving Order Fulfillment Timeliness

Baidu Geek Talk

Oct 30, 2024 · Cloud Computing

Baidu Cloud Infrastructure for AI-Native Era

Baidu Intelligent Cloud outlines how its evolving, high-performance infrastructure—featuring rapid 3-minute instance provisioning, over 200 GB bandwidth, elastic computing, specialized storage, and AI-driven MLOps tools—enables AI-native model training and deployment across booming sectors such as automotive and finance, supporting the industry’s shift to AI-centric cloud services.

Case StudiesCloud ComputingMLOps

0 likes · 9 min read

Baidu Cloud Infrastructure for AI-Native Era

Baidu Intelligent Cloud Tech Hub

Oct 28, 2024 · Cloud Native

How Baidu Smart Cloud Reinvents Cloud‑Native Infrastructure for the AI‑Native Era

The talk outlines Baidu Smart Cloud's comprehensive cloud‑native redesign—including ultra‑elastic compute, AI‑focused storage, high‑performance networking, AI‑driven operations, and edge‑distributed services—illustrated with automotive and fintech case studies that demonstrate how enterprises can accelerate digital transformation in the AI‑native age.

AI InfrastructureData LakeMLOps

0 likes · 12 min read

How Baidu Smart Cloud Reinvents Cloud‑Native Infrastructure for the AI‑Native Era

DataFunTalk

Jun 11, 2024 · Artificial Intelligence

Intelligent Risk Control: Concepts, Challenges, and Integrated Operational Architecture for Banking

This article explores the concept of intelligent risk control in banking, detailing its AI‑driven architecture, current challenges such as external data costs and model‑deployment friction, and proposes an integrated operational framework that leverages big data, knowledge graphs, and MLOps to enhance risk detection and decision‑making.

Knowledge GraphMLOpsartificial-intelligence

0 likes · 14 min read

Intelligent Risk Control: Concepts, Challenges, and Integrated Operational Architecture for Banking

21CTO

Jun 7, 2024 · Artificial Intelligence

10 Essential Tools for Building a Modern AI Data Lake Architecture

This article outlines ten critical components of a modern data lake reference architecture for AI/ML, detailing each function, the supporting vendor tools and open‑source libraries, and how they enable scalable storage, MLOps, distributed training, model hubs, vector search, and data visualization.

AIData LakeMLOps

0 likes · 14 min read

10 Essential Tools for Building a Modern AI Data Lake Architecture

DataFunTalk

Jun 4, 2024 · Artificial Intelligence

Building an Integrated Intelligent Risk Control System for Banking

The article explores the concept, challenges, and future directions of intelligent banking risk control, emphasizing data integration, AI-driven modeling, feature engineering, MLOps, knowledge graphs, and large‑model applications to create a unified, automated risk management platform.

AIKnowledge GraphMLOps

0 likes · 10 min read

Building an Integrated Intelligent Risk Control System for Banking

Architects' Tech Alliance

May 24, 2024 · Industry Insights

What Drives AI's Future? A Four‑Layer Industry Framework Explained

This article breaks down the AI ecosystem into four layers—AI hardware and cloud services, model and algorithm advances, MLOps middleware, and B2B/B2C applications—highlighting how hardware cost reductions, cloud integration, model breakthroughs, and middleware providers shape the market and adoption speed.

AIAI ApplicationsIndustry Analysis

0 likes · 6 min read

What Drives AI's Future? A Four‑Layer Industry Framework Explained

DevOps

Mar 3, 2024 · Operations

How Generative AI is Transforming DevOps: Benefits, Challenges, and Best Practices

Since 2022, generative AI has become a pervasive trend, and this article explores its integration into DevOps, outlining the technology’s advantages, limitations, emerging trends, and best practices while highlighting how AI‑driven automation reshapes software engineering workflows.

AI ethicsGenAIMLOps

0 likes · 10 min read

How Generative AI is Transforming DevOps: Benefits, Challenges, and Best Practices

Didi Tech

Jan 25, 2024 · Artificial Intelligence

Ray-native XGBoost Training Platform: Architecture, Performance, and Technical Challenges

Didi’s new Ray‑native XGBoost training platform replaces the fault‑prone Spark solution with a fully Pythonic, fault‑tolerant architecture that leverages Ray’s autoscaling and gang‑scheduling, delivering 2–6× speedups, reduced failure rates, efficient sparse‑vector handling, scalable hyper‑parameter search, and improved resource utilization for large‑scale machine‑learning workloads.

MLOpsRayXGBoost

0 likes · 20 min read

Ray-native XGBoost Training Platform: Architecture, Performance, and Technical Challenges

Baidu Geek Talk

Dec 6, 2023 · Industry Insights

From MLOps to LMOps: Challenges and Solutions for Large‑Model Operations

This article reviews the evolution from MLOps to LMOps, outlines the core concepts, challenges, and key technologies such as large‑model inference optimization, prompt engineering, and context‑length extension, and offers a forward‑looking perspective on the future of AI operations.

AI OperationsLMOpsMLOps

0 likes · 23 min read

From MLOps to LMOps: Challenges and Solutions for Large‑Model Operations

Baidu Intelligent Cloud Tech Hub

Nov 15, 2023 · Artificial Intelligence

From MLOps to LMOps: Tackling Large Model Challenges and Solutions

This article reviews the evolution from MLOps to LMOps, outlines the fundamentals, challenges, and key technologies of large‑model operations—including inference optimization, prompt engineering, and context‑length extension—and presents Baidu AI Cloud's platform solutions and future outlook.

LMOpsMLOpsPrompt Engineering

0 likes · 23 min read

From MLOps to LMOps: Tackling Large Model Challenges and Solutions

DataFunSummit

Oct 7, 2023 · Artificial Intelligence

MLOps Implementation in Network Intelligence: Jiutian Platform Overview

This article presents the Jiutian Network Intelligence platform’s MLOps implementation at China Mobile, detailing its AI engineering workflow, platform functional and technical architecture, technology selections, model deployment, monitoring, and operational challenges, and shares insights on scaling AI services across 31 provinces.

AI EngineeringMLOpsNetwork Intelligence

0 likes · 20 min read

MLOps Implementation in Network Intelligence: Jiutian Platform Overview

Baidu Intelligent Cloud Tech Hub

Aug 8, 2023 · Artificial Intelligence

Unlocking LMOps: How Enterprises Can Master Large Model Operations

This article explains the evolution from traditional machine learning to the current large‑model era, introduces LMOps concepts and key technologies, compares them with MLOps, and showcases Baidu Cloud's Qianfan platform as a practical solution for building, deploying, and managing large language models in industry.

AI OperationsBaidu CloudLMOps

0 likes · 22 min read

Unlocking LMOps: How Enterprises Can Master Large Model Operations

Cloud Native Technology Community

Jul 27, 2023 · Artificial Intelligence

Kubeflow Overview: CNCF‑Incubated MLOps Platform on Kubernetes

Kubeflow is an open‑source, CNCF‑incubated project that provides a Kubernetes‑native MLOps platform integrating notebooks, training operators, AutoML (Katib), pipelines, and model serving (KServe) to streamline the development, deployment, and scaling of machine learning models across diverse frameworks.

AICNCFKubeflow

0 likes · 7 min read

Kubeflow Overview: CNCF‑Incubated MLOps Platform on Kubernetes

Cloud Native Technology Community

Jun 28, 2023 · Artificial Intelligence

Building and Deploying Custom Large Language Models with Alauda Cloud‑Native MLOps

This article explains how enterprises can use the Alauda MLOps platform to quickly set up, fine‑tune, and deploy private large language models on cloud‑native infrastructure, covering notebook preparation, GPU allocation, model download, inference service creation, distributed training pipelines, and Docker image building.

AILarge Language ModelMLOps

0 likes · 9 min read

Building and Deploying Custom Large Language Models with Alauda Cloud‑Native MLOps

DataFunSummit

Jun 24, 2023 · Artificial Intelligence

From Model to Service: Alibaba Cloud Machine Learning PAI One‑Stop Model Development and Deployment Practice

This article presents an end‑to‑end overview of Alibaba Cloud’s Machine Learning PAI platform, detailing the three‑stage ML workflow, challenges in model development, the role of pre‑trained and open‑source models, PAI’s architecture, a hands‑on demo, and MLOps best practices for efficient model deployment.

Alibaba CloudMLOpsModel Deployment

0 likes · 11 min read

From Model to Service: Alibaba Cloud Machine Learning PAI One‑Stop Model Development and Deployment Practice

DataFunSummit

Mar 30, 2023 · Artificial Intelligence

MindAlpha: A High‑Performance Distributed Machine Learning Platform for Advertising

The article introduces MindAlpha, a high‑performance distributed machine‑learning platform built for large‑scale, sparse ad‑tech workloads, detailing its architecture, MLOps pipeline, Spark integration, sync/async training strategies, CPU/GPU choices, model‑splitting techniques, and future directions such as model pruning and AutoML.

AIAd TechMLOps

0 likes · 10 min read

MindAlpha: A High‑Performance Distributed Machine Learning Platform for Advertising

Tencent Advertising Technology

Mar 30, 2023 · Artificial Intelligence

Tencent's Taiji Machine Learning Platform: End-to-End MLOps for Advertising

Tencent’s Taiji machine learning platform, a cloud‑native, distributed parameter‑server system, provides end‑to‑end MLOps for advertising by integrating data ingestion, feature engineering, model training, evaluation, deployment, and monitoring, supporting massive models up to billions of parameters while improving efficiency, scalability, and resource management.

MLOpsMachine Learning PlatformModel Deployment

0 likes · 18 min read

Tencent's Taiji Machine Learning Platform: End-to-End MLOps for Advertising

DataFunTalk

Mar 25, 2023 · Artificial Intelligence

ZhongAn Financial Real‑Time Feature Platform: MLOps Practices, Architecture and Anti‑Fraud Applications

This article presents ZhongAn Financial’s end‑to‑end MLOps workflow and real‑time feature platform architecture, detailing team roles, data pipelines, Flink‑based processing, TableStore storage, anti‑fraud feature design, and answers to common implementation questions, offering a comprehensive guide for building scalable, low‑latency ML services in finance.

Data EngineeringFlinkMLOps

0 likes · 25 min read

ZhongAn Financial Real‑Time Feature Platform: MLOps Practices, Architecture and Anti‑Fraud Applications

Smart Era Software Development

Mar 16, 2023 · Artificial Intelligence

10 Essential Elements of Machine Learning System Architecture

The article outlines ten core components—data and feature pipelines, feature store, training and retraining pipelines, metadata store, serving infrastructure, production monitoring, reusable ML pipelines, workflow orchestration, CI/CT/CD, and end‑to‑end quality control—that together form a scalable, reliable architecture for modern machine‑learning systems.

MLOpsfeature engineeringmachine learning

0 likes · 7 min read

10 Essential Elements of Machine Learning System Architecture

AntTech

Mar 13, 2023 · Artificial Intelligence

Thoughts on the Next‑Generation AI Infrastructure: Green and Shared Model‑as‑a‑Service

In this conference talk, He Zhengyu of Ant Group outlines the challenges of large‑model AI, proposes a green, shared, model‑centric infrastructure built on foundation models, cloud‑native MLOps, and Model‑as‑a‑Service (MaaS) to lower cost and accelerate AI adoption across industries.

AI InfrastructureCloud NativeFoundation Models

0 likes · 14 min read

Thoughts on the Next‑Generation AI Infrastructure: Green and Shared Model‑as‑a‑Service

DataFunSummit

Feb 21, 2023 · Artificial Intelligence

Practices and Reflections on Building an AI Platform at Zhongyuan Bank

This article details Zhongyuan Bank's AI platform construction, covering its objectives, MLOps-driven design, core modules such as data ingestion, processing, model development, training, evaluation, deployment, monitoring, as well as resource orchestration with Kubernetes and Docker, and the accompanying ModelOps governance framework.

AICloud ComputingData Governance

0 likes · 22 min read

Practices and Reflections on Building an AI Platform at Zhongyuan Bank

Efficient Ops

Jan 16, 2023 · Artificial Intelligence

How MLOps Is Transforming AI Production in China: Trends, Tools, and Standards

This report examines how MLOps is accelerating AI production in China, highlighting industry adoption across sectors, the booming tool ecosystem, the rise of feature platforms, enhanced observability, performance needs for large models, AI asset management, and the emerging national standards and evaluation results.

AI EngineeringAI standardsFeatureOps

0 likes · 8 min read

How MLOps Is Transforming AI Production in China: Trends, Tools, and Standards

GuanYuan Data Tech Team

Dec 29, 2022 · Artificial Intelligence

How AI Transforms DTC Supply‑Chain Replenishment: From Safety‑Stock Theory to Real‑World Deployment

This article explains how AI‑driven forecasting, joint error‑distribution safety‑stock calculations, and MLOps‑backed simulation are combined to optimize DTC replenishment, improve inventory days and stock‑out rates, and address practical deployment challenges in fast‑growing e‑commerce supply chains.

AIMLOpsSimulation

0 likes · 21 min read

How AI Transforms DTC Supply‑Chain Replenishment: From Safety‑Stock Theory to Real‑World Deployment

AntTech

Dec 26, 2022 · Artificial Intelligence

AntSec MLOps: Building a Scalable, Automated, and Trustworthy AI Risk‑Control Platform

This article describes the challenges, overall architecture, data development, model monitoring, continuous training, security‑trustworthiness, and future roadmap of Ant Security's intelligent risk‑control platform, illustrating how AI, big data, and cloud computing are integrated to create a scalable, automated MLOps solution for dynamic fraud detection and mitigation.

AIAutomationMLOps

0 likes · 28 min read

AntSec MLOps: Building a Scalable, Automated, and Trustworthy AI Risk‑Control Platform

GuanYuan Data Tech Team

Dec 1, 2022 · Artificial Intelligence

Why MLOps Is the Key to Scalable AI Projects

This article explains the concept, significance, and practical case studies of MLOps—showing how integrating DevOps principles with data and machine learning creates reliable, automated pipelines for data quality, model monitoring, error analysis, and continuous integration, ultimately accelerating AI delivery.

AI EngineeringContinuous IntegrationMLOps

0 likes · 15 min read

Why MLOps Is the Key to Scalable AI Projects

Efficient Ops

Nov 29, 2022 · Artificial Intelligence

How MLOps is Revolutionizing AI Development: Baidu’s Flagship Platform Insights

This article examines how China’s AI strategy and newly released MLOps standards are driving AI engineering, featuring Baidu Cloud’s flagship-level platform, its evaluation results, practical benefits, challenges, and future directions for MLOps in enterprise AI development.

BaiduEnterprise AIMLOps

0 likes · 10 min read

How MLOps is Revolutionizing AI Development: Baidu’s Flagship Platform Insights

Laiye Technology Team

Nov 23, 2022 · Artificial Intelligence

Design and Practices of a Data‑Driven OCR Testing System

The article describes Laiye's shift to a data‑driven deep‑learning workflow and presents the design, macro‑ and micro‑analysis features, visual diff tools, distributed tracing, and code examples of their OCR testing system that accelerate model evaluation and iterative optimization.

AIData‑DrivenMLOps

0 likes · 11 min read

Design and Practices of a Data‑Driven OCR Testing System

DataFunSummit

Nov 18, 2022 · Artificial Intelligence

DataFun Summit 2022: AI Foundations, Large‑Scale Model Training, and AI Infrastructure

The DataFun Summit 2022 brings together leading AI researchers and industry experts to discuss deep‑learning frameworks, ultra‑large model training, AI chips, compilers, MLOps, and end‑to‑end AI infrastructure, offering live streaming of six thematic forums and dozens of technical talks.

AIAI InfrastructureMLOps

0 likes · 30 min read

DataFun Summit 2022: AI Foundations, Large‑Scale Model Training, and AI Infrastructure

Efficient Ops

Nov 7, 2022 · Artificial Intelligence

Unlocking AI Project Success with the New MLOps Maturity Assessment

This article outlines the background, standards, evaluation items, process, and registration details of a newly launched MLOps development management maturity assessment designed to accelerate AI model delivery and improve operational efficiency across teams.

AI OperationsMLOpsModel Deployment

0 likes · 6 min read

Unlocking AI Project Success with the New MLOps Maturity Assessment

DataFunTalk

Oct 27, 2022 · Artificial Intelligence

Data‑Centric AI and MLOps: A Case Study of Smart‑Cabin Applications in the Automotive Industry

The talk by Magic Data’s founder Zhang Qingqing outlines the shift from model‑centric to data‑centric AI, introduces Data‑Centric MLOps methodology, and demonstrates its automotive smart‑cabin application, highlighting data quality requirements, collaborative workflow, and performance gains across speech, live‑social and navigation scenarios.

AI ApplicationsAutomotive AIData-centric AI

0 likes · 9 min read

Data‑Centric AI and MLOps: A Case Study of Smart‑Cabin Applications in the Automotive Industry

Efficient Ops

Oct 26, 2022 · Artificial Intelligence

Unveiling China’s AI Model Delivery Standard: Boosting MLOps and AI Engineering

China’s 14th Five-Year Plan and 2035 Vision prioritize AI, prompting a shift from proof‑of‑concept to product deployment; the newly released Model Delivery standard, part of the Model/MLOps maturity model, defines five maturity levels and a reusable pipeline to boost AI engineering across industries.

AIAI EngineeringChina

0 likes · 5 min read

Unveiling China’s AI Model Delivery Standard: Boosting MLOps and AI Engineering

vivo Internet Technology

Oct 9, 2022 · Artificial Intelligence

vivo Machine Learning Platform: Architecture Design and Practice

vivo’s machine‑learning platform, built for its massive app‑store and e‑commerce ecosystem, streamlines data processing, model training, and deployment through quota‑based resource management, a custom ultra‑large‑scale TensorFlow‑vlps framework, OpenAPI‑driven training, and Jupyter‑integrated interactive development, boosting efficiency for billions of samples and features.

MLOpsMachine Learning PlatformModel Deployment

0 likes · 12 min read

vivo Machine Learning Platform: Architecture Design and Practice

Efficient Ops

Aug 22, 2022 · Operations

What Were the Key Takeaways from the 2022 GOPS Global Operations Conference in Shenzhen?

The 2022 GOPS Global Operations Conference in Shenzhen gathered over a thousand attendees for two days of 18 sessions, featuring more than 80 speakers who shared insights on DevOps, cloud native, AI engineering, MLOps, and industry‑specific operational practices across finance, telecom, and technology sectors.

2022AICloud Native

0 likes · 13 min read

What Were the Key Takeaways from the 2022 GOPS Global Operations Conference in Shenzhen?

Efficient Ops

Aug 9, 2022 · Operations

How ICBC Accelerated Digital Transformation with XOps: From DevOps to MLOps

ICBC’s software development center outlines its multi‑year journey adopting XOps practices—DevOps, DevSecOps, DataOps, MLOps, AIOps, ChatOps and BizDevOps—to boost development efficiency, enhance security, accelerate data‑driven AI, and cut costs, showcasing measurable improvements in release frequency, defect rates, and operational automation.

AIOpsDataOpsDigitalTransformation

0 likes · 13 min read

How ICBC Accelerated Digital Transformation with XOps: From DevOps to MLOps

Efficient Ops

Aug 2, 2022 · Artificial Intelligence

How MLOps Boosted AI Service Delivery at China Agricultural Bank

In a detailed interview, the Agricultural Bank of China's R&D center explains how its AI service platform achieved a Level‑3 leading rating in the national MLOps maturity assessment, and how MLOps practices have accelerated model development, improved quality, reduced risk, and driven scalable AI adoption across financial services.

MLOpsModelOpsfinancial AI

0 likes · 10 min read

How MLOps Boosted AI Service Delivery at China Agricultural Bank

Cloud Native Technology Community

Jul 21, 2022 · Cloud Native

Simplify Kubeflow Deployment with kubeflow-chart: A Step‑by‑Step Guide

This article analyzes the difficulties of using vanilla Kubeflow for MLOps, introduces the kubeflow‑chart Helm chart that streamlines deployment and integrates tools like SQLFlow and kfpdist, and provides detailed installation commands and a roadmap of upcoming components for a full cloud‑native AI platform.

AI platformCloud NativeKubeflow

0 likes · 12 min read

Simplify Kubeflow Deployment with kubeflow-chart: A Step‑by‑Step Guide

NetEase Cloud Music Tech Team

Jul 6, 2022 · Industry Insights

Inside NetEase Cloud Music’s MLOps: Scaling AI with VK, ECI, and Ceph

This article details NetEase Cloud Music’s four‑layer machine‑learning platform architecture, covering resource provisioning with Visual Kubelet and Alibaba Cloud ECI, Ceph storage optimizations, TensorFlow migration, large‑scale graph neural network support, and end‑to‑end workflow tooling that together enable efficient, cost‑effective AI development and deployment.

CephGPUGraph Neural Network

0 likes · 24 min read

Inside NetEase Cloud Music’s MLOps: Scaling AI with VK, ECI, and Ceph

DataFunSummit

Jun 30, 2022 · Artificial Intelligence

MLOps Practices on the Beike Inference Platform: Architecture, Evolution, and Future Plans

This article presents a comprehensive overview of Beike's machine learning platform and its inference service, detailing the platform's architecture, GPU virtualization, cloud‑native migration, MLOps implementation, and future roadmap to achieve cost‑effective, automated AI model deployment at scale.

AICloud NativeGPU virtualization

0 likes · 13 min read

MLOps Practices on the Beike Inference Platform: Architecture, Evolution, and Future Plans

NetEase Cloud Music Tech Team

Jun 29, 2022 · Artificial Intelligence

Music FeatureBox: A Custom Feature Store for Machine Learning at NetEase Cloud Music

Music FeatureBox is NetEase Cloud Music’s custom feature store that centralizes metadata, unifies offline and online feature storage across multiple engines, provides a cross‑language DSL for extraction, ensures training‑inference data consistency, and offers built‑in monitoring, thereby streamlining feature engineering and accelerating the platform’s machine‑learning lifecycle.

DataHubFeature StoreMLOps

0 likes · 17 min read

Music FeatureBox: A Custom Feature Store for Machine Learning at NetEase Cloud Music

Efficient Ops

Jun 12, 2022 · Artificial Intelligence

Unlocking AI Success: A Deep Dive into the Model/MLOps Capability Maturity Framework

This article explains the globally first AI model development management standard—Model/MLOps Capability Maturity Model (Part 1: Development Management)—detailing its structure, key domains such as requirement management, test case design, and project planning, and how organizations can assess and improve their AI engineering capabilities.

AI GovernanceCapability Maturity ModelMLOps

0 likes · 9 min read

Unlocking AI Success: A Deep Dive into the Model/MLOps Capability Maturity Framework

DataFunTalk

May 31, 2022 · Artificial Intelligence

Using DolphinScheduler OpenMLDB Task for End‑to‑End MLOps Workflow

This article introduces the DolphinScheduler OpenMLDB Task, explains how it integrates OpenMLDB's feature platform into DolphinScheduler workflows to create a complete MLOps pipeline, and provides a step‑by‑step demonstration using the TalkingData ad‑fraud detection dataset from Kaggle.

DolphinSchedulerMLOpsOpenMLDB

0 likes · 7 min read

Using DolphinScheduler OpenMLDB Task for End‑to‑End MLOps Workflow

Efficient Ops

Apr 29, 2022 · Artificial Intelligence

China’s First AI Model Development Standard – Highlights from the AI Engineering Forum

The AI Engineering Online Forum, co‑hosted by China Academy of Information and Communications Technology, unveiled the industry’s first AI Model Development and Management (Model/MLOps) maturity standard, featured expert insights from finance, telecom, and tech leaders, and showcased practical MLOps implementations across banking, Huawei, and AI startups.

AIForumMLOps

0 likes · 6 min read

China’s First AI Model Development Standard – Highlights from the AI Engineering Forum

Code DAO

Apr 26, 2022 · Artificial Intelligence

Building an Open-Source ML Pipeline – Part 1: Data Ingestion & Storage

This article walks through building the first stage of an open‑source MLOps pipeline—data ingestion and storage—by outlining requirements, selecting tools such as Argo Workflows, Minio and Great Expectations, showing how to set up a minikube cluster, and providing Python scripts and an Argo CronWorkflow to extract, transform, and load OpenAQ air‑quality data into Minio.

Argo WorkflowsMLOpsOpenAQ

0 likes · 10 min read

Building an Open-Source ML Pipeline – Part 1: Data Ingestion & Storage

Efficient Ops

Apr 24, 2022 · Artificial Intelligence

How ModelOps and MLOps Accelerate AI Project Development

ModelOps and MLOps are transforming AI engineering by introducing continuous training, integration, and deployment, which streamline development cycles, standardize model management, and enable ongoing monitoring to enhance inference accuracy and maximize the business value generated by AI models.

AI EngineeringContinuous DeploymentMLOps

0 likes · 1 min read

How ModelOps and MLOps Accelerate AI Project Development

DataFunTalk

Jan 25, 2022 · Cloud Native

Model Deployment Challenges and a Seldon‑Based Cloud‑Native Solution

This article analyzes the complexities of deploying machine‑learning models in production, outlines the limitations of the existing ABox architecture, and details a comprehensive cloud‑native redesign using Seldon on Kubernetes—including custom HDFS initializers, GPU management, logging, and resource monitoring—to streamline operations and enable unified CPU/GPU model serving.

Cloud NativeGPUMLOps

0 likes · 12 min read

Model Deployment Challenges and a Seldon‑Based Cloud‑Native Solution

dbaplus Community

Jan 8, 2022 · Artificial Intelligence

How Ctrip Streamlined ML Model Development and Deployment with MLOps

This article explains how Ctrip tackled the long, costly ML model development‑to‑deployment pipeline by adopting and extending MLflow for full lifecycle management, covering model persistence, tracking, serving, custom pyfunc models, Dockerized deployment, scaling, and performance monitoring.

DockerFastAPIMLOps

0 likes · 14 min read

How Ctrip Streamlined ML Model Development and Deployment with MLOps

360 Tech Engineering

Jul 2, 2021 · Artificial Intelligence

DGL Operator: A Kubernetes‑Native Solution for Distributed Graph Neural Network Training

The article introduces DGL Operator, an open‑source Kubernetes‑based controller that automates the lifecycle of distributed graph neural network training with DGL, explains its terminology, challenges of native DGL distribution, and provides detailed architecture, workflow, and YAML/CLI examples for easy deployment.

AIDGLGraph Neural Networks

0 likes · 18 min read

DGL Operator: A Kubernetes‑Native Solution for Distributed Graph Neural Network Training

DevOps

Feb 9, 2021 · Operations

Choosing Between DataOps, MLOps, and AIOps: A Guide for Data Teams

The article examines how data teams can select the appropriate Ops framework—DataOps, MLOps, or AIOps—by comparing their origins, principles, responsibilities, and tooling, and stresses that cultural principles outweigh technology choices for efficient delivery of data and machine‑learning products.

AIOpsData EngineeringDataOps

0 likes · 12 min read

Choosing Between DataOps, MLOps, and AIOps: A Guide for Data Teams

Top Architect

Jan 29, 2021 · Operations

Choosing Between DataOps, MLOps, and AIOps: Principles, Practices, and the X‑Ops Culture

The article explains the origins and differences of DevOps, DataOps, MLOps and AIOps, outlines their shared seven principles, and provides practical guidance on adopting the right X‑Ops culture to accelerate data‑driven and machine‑learning‑powered software delivery.

AIOpsDataOpsMLOps

0 likes · 11 min read

Choosing Between DataOps, MLOps, and AIOps: Principles, Practices, and the X‑Ops Culture

Youzan Coder

Jun 17, 2020 · Artificial Intelligence

Sunfish: An Integrated AI Platform for Model Training and Online Service Deployment at Youzan

Sunfish is Youzan’s integrated AI platform that unifies visual drag‑and‑drop model training, notebook‑based algorithm development, automated model management and one‑click publishing with a low‑latency, high‑availability “small‑box” inference service, enabling end‑to‑end deep‑learning workflows from data exploration to online recommendation and risk‑control deployment.

AI platformMLOpsModel Training

0 likes · 17 min read

Sunfish: An Integrated AI Platform for Model Training and Online Service Deployment at Youzan

360 Zhihui Cloud Developer

Apr 23, 2019 · Artificial Intelligence

Mastering Kubeflow: Deploy AI Workflows on Kubernetes Step‑by‑Step

This article introduces Kubeflow, a Kubernetes‑based machine‑learning platform, outlines the typical ML lifecycle, details core components, explains why Kubernetes benefits AI workloads, and provides a step‑by‑step guide for installing and accessing Kubeflow’s services, concluding with its industry impact.

AI platformKubeflowMLOps

0 likes · 7 min read

Mastering Kubeflow: Deploy AI Workflows on Kubernetes Step‑by‑Step