Cloud Native 19 min read

How to Build a Production‑Ready AI Platform with Alibaba Cloud SAE & SLS

This article walks through the architectural bottlenecks of scaling Dify AI applications, explains how Alibaba Cloud Serverless Application Engine (SAE) and Log Service (SLS) jointly provide a fully managed, elastic compute base and storage‑separated logging layer, and offers step‑by‑step deployment, performance‑tuning, and analytics guidance for achieving up to 500 QPS with low cost.

Alibaba Cloud Native
Alibaba Cloud Native
Alibaba Cloud Native
How to Build a Production‑Ready AI Platform with Alibaba Cloud SAE & SLS

Background and Challenges

Dify is a micro‑service‑based AI workflow platform that combines API services, workers, a web front‑end, KV cache, relational and vector databases. In a demo environment Docker Compose with PostgreSQL suffices, but in production the database becomes a performance bottleneck due to heavy log writes, and the overall micro‑service architecture brings high operational complexity, resource waste, and limited QPS (≈10).

Solution Overview: SAE + SLS

The joint solution leverages Alibaba Cloud Serverless Application Engine (SAE) for elastic compute orchestration and Log Service (SLS) for "store‑compute separation" of massive workflow logs. Together they deliver a high‑elasticity, low‑cost, fully managed Dify runtime.

SAE – Fully Managed Compute

One‑click full‑stack delivery: Pre‑built templates deploy the entire micro‑service cluster, automatically integrating SLS (log storage), Tablestore (vector storage), Redis, and RDS for PostgreSQL.

Enterprise‑grade high availability: Instances span multiple zones with health checks and self‑healing, supporting canary releases.

Second‑level compute elasticity: Auto‑scaling based on CPU, memory, or QPS, scaling workers up during inference peaks and releasing idle resources during troughs.

Deep performance tuning: SAE patches Redis adapters, fixes slow SQL, and aligns resource specs, boosting throughput from 10 QPS to 500 QPS (≈50×).

SLS – Scalable Log Storage

Extreme storage elasticity: As a SaaS service, SLS scales seconds‑wise without manual sharding or capacity planning.

Load isolation: Append‑only writes avoid random I/O and lock contention, supporting tens of thousands of TPS.

Tiered low‑cost retention: Hot data is kept for real‑time analysis; cold data is archived at costs far below SSD databases.

Built‑in OLAP analytics: SQL queries, visual dashboards, and alerts turn raw logs into actionable business insights.

Deployment Steps

Step 1: Choose a deployment template

In the SAE console, select the "Dify Community – Serverless Deployment" template.

Step 2: Configure parameters and specifications

Three templates are available: Dify High‑Performance, High‑Availability, and Test. For high‑concurrency production, pick the Dify High‑Performance version, which uses optimized api and plugin‑daemon images.

Example configuration values:

SERVER_WORKER_AMOUNT=1
SERVER_WORKER_CONNECTIONS=10

These defaults lock the single‑node throughput; the production matrix (derived from full‑link load tests) provides higher worker counts and larger DB connection pools matched to expected traffic.

Step 3: Submit and access the service

After submission, SAE automatically provisions the services and links them to cloud resources. The generated endpoint ${EXTERNAL-IP}:${PORT} can be opened in a browser to start using Dify.

Performance Tuning Details

Database connection bottleneck

Default Dify settings ( SERVER_WORKER_AMOUNT=1, SERVER_WORKER_CONNECTIONS=10) cap single‑node QPS. Raising these values without adjusting the database leads to connection exhaustion. The production matrix aligns API concurrency, PostgreSQL connection pool size, and component resources for each traffic tier.

Redis single‑point limitation

High‑frequency reads/writes by dify‑plugin‑daemon saturate a single Redis node, causing latency spikes beyond 200 QPS. The solution migrates Redis to a clustered deployment with read/write separation, eliminating the single‑node choke and enabling smooth scaling to 500 QPS.

Observability and Business Insights via SLS

SLS’s OLAP engine allows deep analysis of LLM usage, token consumption, latency, and user intent without predefined schemas.

Token cost audit (Scenario A)

node_type:llm | select
  sum(json_extract_long(process_data, '$.usage.prompt_tokens')) prompt_tokens,
  sum("process_data.usage.completion_tokens") completion_tokens,
  sum("process_data.usage.total_tokens") total_tokens,
  date_trunc('minute', __time__) t
group by t
order by t
limit all

First‑token latency distribution (Scenario B)

node_type:llm | select
  date_format(__time__ - __time__ % 60, '%m-%d %H:%i') as time,
  approx_percentile("process_data.usage.time_to_first_token", 0.25) as Latency_p25,
  approx_percentile("process_data.usage.time_to_first_token", 0.50) as Latency_p50,
  approx_percentile("process_data.usage.time_to_first_token", 0.75) as Latency_p75,
  approx_percentile("process_data.usage.time_to_first_token", 0.99) as Latency_p99,
  min("process_data.usage.time_to_first_token") as Latency_min
group by time
order by time
limit all

User‑intent trend (Scenario C)

* and title: 用户意图识别 | select
  json_extract(outputs, '$.text') as "用户意图",
  count(1) as pv
group by "用户意图"

Funnel analysis for error diagnosis

status:succeeded | select
  title,
  count(distinct workflow_run_id) cnt
group by title
order by cnt desc

Key Benefits

Full‑stack elasticity: Compute scales per‑second, storage handles bursty log volumes.

Cost efficiency: Idle resources are eliminated; tiered storage reduces expenses compared to database scaling.

Stability: Managed services, multi‑zone deployment, and I/O isolation remove single‑point failures.

Deep insight: End‑to‑end monitoring bridges infrastructure metrics with business‑level token and intent analytics.

By combining SAE’s managed compute with SLS’s log‑centric storage, developers can focus on AI application logic and prompt engineering while the platform ensures high availability, performance, and cost‑effective operation.

Architecture diagram
Architecture diagram
cloud-nativeperformance-optimizationLog Servicesaeai-infrastructure
Alibaba Cloud Native
Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.