How Huolala’s Dolphin Platform Accelerates AI Model Delivery with Cloud‑Native Automation
This article describes how Huolala built a cloud‑native AI development platform called Dolphin to overcome low model delivery efficiency and poor compute‑resource utilization, detailing its architecture, one‑stop workflow, resource‑pooling, observability, and future roadmap for scaling AI across the company.
1. Introduction
As AI technology matures, it has become a key productivity tool across industries. Huolala, an internet logistics technology company, has deepened AI over the past years, achieving notable results in AI pricing, AI marketing, AI customer service, and AI security.
2. Challenges of AI Capability Landing
Despite widespread AI adoption in Huolala’s business lines, model development still faces low delivery efficiency and poor compute‑resource utilization. To address these, a full‑link AI development service system covering data processing, model development, training, deployment, and online inference was built, forming a low‑threshold, high‑performance, cloud‑native AI platform.
2.1 Model delivery efficiency is low
The end‑to‑end workflow includes data collection, processing, model development, training, and deployment, which is complex and fragmented across platforms, leading to data silos, manual copying of large model files, and non‑reusable environment configurations.
2.2 Compute resource utilization is low
GPU resources are managed separately by teams, causing uneven utilization. Shared‑node deployment can improve usage, but manual scheduling cannot dynamically adjust resources, leaving significant optimization space.
3. Dolphin Platform Overview
Dolphin is a low‑threshold, highly available cloud‑native AI development platform for algorithm and engineering teams. It integrates data processing, model development, training, deployment, and online inference into a one‑stop closed loop.
3.1 One‑stop AI development
Through distributed storage, engineers can select datasets and models directly without manual uploads, eliminating data islands.
3.1.2 Image management
The platform provides built‑in images (e.g., Triton, TensorRT‑llm, VLLM) and supports custom images via Dockerfile, ensuring consistent runtime environments.
3.1.3 One‑click model deployment
Using Kubernetes Deployments, engineers configure start commands, request compute resources, and choose an image to deploy services instantly, with automatic scaling for traffic spikes.
Step 1 – Publish configuration
Step 2 – Adjust compute resources
Step 3 – Choose image and version
Step 4 – Scale up/down
3.2 Compute resource management
Resource pools are created on Kubernetes, offering GPU node pools with physical isolation for development, training, and inference.
Fine‑grained GPU sharing supports allocations as small as 128 MiB, while large models can span multiple GPUs.
Idle resources are automatically reclaimed after a predefined usage window.
3.3 Stability construction
Observability is achieved by aggregating cluster, service, and gateway metrics, logs, and traces, enabling rapid issue detection.
Service monitoring provides real‑time health checks for deployed models.
High availability is built through redundant deployments and failover mechanisms.
4. Dolphin Platform Applications
General AI solutions (image detection, natural language processing, speech synthesis) are productized for quick integration without additional development.
The large‑model marketplace offers pre‑trained models and configurable fine‑tuning pipelines, simplifying deployment of big models.
5. Future Roadmap
Plans include expanding AI capabilities to more business units, improving GPU allocation efficiency, and enriching the large‑model infrastructure and market.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
