How to Build a Production‑Ready AI Platform with Alibaba Cloud SAE & SLS
This article walks through the architectural bottlenecks of scaling Dify AI applications, explains how Alibaba Cloud Serverless Application Engine (SAE) and Log Service (SLS) jointly provide a fully managed, elastic compute base and storage‑separated logging layer, and offers step‑by‑step deployment, performance‑tuning, and analytics guidance for achieving up to 500 QPS with low cost.
Background and Challenges
Dify is a micro‑service‑based AI workflow platform that combines API services, workers, a web front‑end, KV cache, relational and vector databases. In a demo environment Docker Compose with PostgreSQL suffices, but in production the database becomes a performance bottleneck due to heavy log writes, and the overall micro‑service architecture brings high operational complexity, resource waste, and limited QPS (≈10).
Solution Overview: SAE + SLS
The joint solution leverages Alibaba Cloud Serverless Application Engine (SAE) for elastic compute orchestration and Log Service (SLS) for "store‑compute separation" of massive workflow logs. Together they deliver a high‑elasticity, low‑cost, fully managed Dify runtime.
SAE – Fully Managed Compute
One‑click full‑stack delivery: Pre‑built templates deploy the entire micro‑service cluster, automatically integrating SLS (log storage), Tablestore (vector storage), Redis, and RDS for PostgreSQL.
Enterprise‑grade high availability: Instances span multiple zones with health checks and self‑healing, supporting canary releases.
Second‑level compute elasticity: Auto‑scaling based on CPU, memory, or QPS, scaling workers up during inference peaks and releasing idle resources during troughs.
Deep performance tuning: SAE patches Redis adapters, fixes slow SQL, and aligns resource specs, boosting throughput from 10 QPS to 500 QPS (≈50×).
SLS – Scalable Log Storage
Extreme storage elasticity: As a SaaS service, SLS scales seconds‑wise without manual sharding or capacity planning.
Load isolation: Append‑only writes avoid random I/O and lock contention, supporting tens of thousands of TPS.
Tiered low‑cost retention: Hot data is kept for real‑time analysis; cold data is archived at costs far below SSD databases.
Built‑in OLAP analytics: SQL queries, visual dashboards, and alerts turn raw logs into actionable business insights.
Deployment Steps
Step 1: Choose a deployment template
In the SAE console, select the "Dify Community – Serverless Deployment" template.
Step 2: Configure parameters and specifications
Three templates are available: Dify High‑Performance, High‑Availability, and Test. For high‑concurrency production, pick the Dify High‑Performance version, which uses optimized api and plugin‑daemon images.
Example configuration values:
SERVER_WORKER_AMOUNT=1
SERVER_WORKER_CONNECTIONS=10These defaults lock the single‑node throughput; the production matrix (derived from full‑link load tests) provides higher worker counts and larger DB connection pools matched to expected traffic.
Step 3: Submit and access the service
After submission, SAE automatically provisions the services and links them to cloud resources. The generated endpoint ${EXTERNAL-IP}:${PORT} can be opened in a browser to start using Dify.
Performance Tuning Details
Database connection bottleneck
Default Dify settings ( SERVER_WORKER_AMOUNT=1, SERVER_WORKER_CONNECTIONS=10) cap single‑node QPS. Raising these values without adjusting the database leads to connection exhaustion. The production matrix aligns API concurrency, PostgreSQL connection pool size, and component resources for each traffic tier.
Redis single‑point limitation
High‑frequency reads/writes by dify‑plugin‑daemon saturate a single Redis node, causing latency spikes beyond 200 QPS. The solution migrates Redis to a clustered deployment with read/write separation, eliminating the single‑node choke and enabling smooth scaling to 500 QPS.
Observability and Business Insights via SLS
SLS’s OLAP engine allows deep analysis of LLM usage, token consumption, latency, and user intent without predefined schemas.
Token cost audit (Scenario A)
node_type:llm | select
sum(json_extract_long(process_data, '$.usage.prompt_tokens')) prompt_tokens,
sum("process_data.usage.completion_tokens") completion_tokens,
sum("process_data.usage.total_tokens") total_tokens,
date_trunc('minute', __time__) t
group by t
order by t
limit allFirst‑token latency distribution (Scenario B)
node_type:llm | select
date_format(__time__ - __time__ % 60, '%m-%d %H:%i') as time,
approx_percentile("process_data.usage.time_to_first_token", 0.25) as Latency_p25,
approx_percentile("process_data.usage.time_to_first_token", 0.50) as Latency_p50,
approx_percentile("process_data.usage.time_to_first_token", 0.75) as Latency_p75,
approx_percentile("process_data.usage.time_to_first_token", 0.99) as Latency_p99,
min("process_data.usage.time_to_first_token") as Latency_min
group by time
order by time
limit allUser‑intent trend (Scenario C)
* and title: 用户意图识别 | select
json_extract(outputs, '$.text') as "用户意图",
count(1) as pv
group by "用户意图"Funnel analysis for error diagnosis
status:succeeded | select
title,
count(distinct workflow_run_id) cnt
group by title
order by cnt descKey Benefits
Full‑stack elasticity: Compute scales per‑second, storage handles bursty log volumes.
Cost efficiency: Idle resources are eliminated; tiered storage reduces expenses compared to database scaling.
Stability: Managed services, multi‑zone deployment, and I/O isolation remove single‑point failures.
Deep insight: End‑to‑end monitoring bridges infrastructure metrics with business‑level token and intent analytics.
By combining SAE’s managed compute with SLS’s log‑centric storage, developers can focus on AI application logic and prompt engineering while the platform ensures high availability, performance, and cost‑effective operation.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
