Artificial Intelligence 10 min read

How Huolala’s Dolphin Platform Accelerates AI Model Delivery with Cloud‑Native Automation

This article describes how Huolala built a cloud‑native AI development platform called Dolphin to overcome low model delivery efficiency and poor compute‑resource utilization, detailing its architecture, one‑stop workflow, resource‑pooling, observability, and future roadmap for scaling AI across the company.

Huolala Tech

Oct 24, 2024

How Huolala’s Dolphin Platform Accelerates AI Model Delivery with Cloud‑Native Automation

1. Introduction

As AI technology matures, it has become a key productivity tool across industries. Huolala, an internet logistics technology company, has deepened AI over the past years, achieving notable results in AI pricing, AI marketing, AI customer service, and AI security.

2. Challenges of AI Capability Landing

Despite widespread AI adoption in Huolala’s business lines, model development still faces low delivery efficiency and poor compute‑resource utilization. To address these, a full‑link AI development service system covering data processing, model development, training, deployment, and online inference was built, forming a low‑threshold, high‑performance, cloud‑native AI platform.

2.1 Model delivery efficiency is low

The end‑to‑end workflow includes data collection, processing, model development, training, and deployment, which is complex and fragmented across platforms, leading to data silos, manual copying of large model files, and non‑reusable environment configurations.

2.2 Compute resource utilization is low

GPU resources are managed separately by teams, causing uneven utilization. Shared‑node deployment can improve usage, but manual scheduling cannot dynamically adjust resources, leaving significant optimization space.

3. Dolphin Platform Overview

Dolphin is a low‑threshold, highly available cloud‑native AI development platform for algorithm and engineering teams. It integrates data processing, model development, training, deployment, and online inference into a one‑stop closed loop.

3.1 One‑stop AI development

Through distributed storage, engineers can select datasets and models directly without manual uploads, eliminating data islands.

3.1.2 Image management

The platform provides built‑in images (e.g., Triton, TensorRT‑llm, VLLM) and supports custom images via Dockerfile, ensuring consistent runtime environments.

3.1.3 One‑click model deployment

Using Kubernetes Deployments, engineers configure start commands, request compute resources, and choose an image to deploy services instantly, with automatic scaling for traffic spikes.

Step 1 – Publish configuration

Step 2 – Adjust compute resources

Step 3 – Choose image and version

Step 4 – Scale up/down

3.2 Compute resource management

Resource pools are created on Kubernetes, offering GPU node pools with physical isolation for development, training, and inference.

Fine‑grained GPU sharing supports allocations as small as 128 MiB, while large models can span multiple GPUs.

Idle resources are automatically reclaimed after a predefined usage window.

3.3 Stability construction

Observability is achieved by aggregating cluster, service, and gateway metrics, logs, and traces, enabling rapid issue detection.

Service monitoring provides real‑time health checks for deployed models.

High availability is built through redundant deployments and failover mechanisms.

4. Dolphin Platform Applications

General AI solutions (image detection, natural language processing, speech synthesis) are productized for quick integration without additional development.

The large‑model marketplace offers pre‑trained models and configurable fine‑tuning pipelines, simplifying deployment of big models.

5. Future Roadmap

Plans include expanding AI capabilities to more business units, improving GPU allocation efficiency, and enriching the large‑model infrastructure and market.

cloud-native model deployment Kubernetes resource management

Written by

Huolala Tech

Technology reshapes logistics

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.