Cloud Native 17 min read

How Alibaba Cloud Log Service Supercharges Dify’s Scaling and Cuts DB Costs

This article examines Dify’s production‑scale bottlenecks caused by heavy PostgreSQL logging, explains why a cloud‑native log service (SLS) better matches the append‑only, high‑throughput nature of workflow logs, and provides a step‑by‑step migration guide that dramatically reduces database pressure, storage cost, and unlocks advanced analytics.

Alibaba Cloud Observability
Alibaba Cloud Observability
Alibaba Cloud Observability
How Alibaba Cloud Log Service Supercharges Dify’s Scaling and Cuts DB Costs

Background and Scaling Challenges

Dify, a popular low‑code LLM application platform, faces severe database performance limits in large‑scale deployments because each chat request can trigger hundreds of PostgreSQL reads and writes, exhausting connection pools and causing slow queries.

Data Types and Current Bottlenecks

Meta data : tenant, app, workflow, tool configurations stored in PostgreSQL.

Runtime logs : execution details, session history, and message records that dominate DB I/O.

File data : user uploads and knowledge‑base documents kept in object storage.

Runtime logs alone occupy about 95% of DB storage and are the primary source of high‑frequency reads and writes.

Community Optimizations (Pre‑SLS)

In‑memory databases for lightweight scenarios.

Asynchronous log writes via Celery workers.

Periodic cleanup jobs that purge old logs.

Large‑field off‑loading to object storage.

These measures improve latency but hit a scalability ceiling because the underlying relational model still mismatches log characteristics.

Root Cause: Log Characteristics vs. Relational DB

Immutable final state : Logs become read‑only archives after execution.

Schemaless & rapidly changing JSON payloads : Frequent DDL changes cause lock contention.

High‑throughput sequential writes : Massive time‑series inserts exhaust connection pools.

Attempting to store append‑only, schema‑on‑read data in PostgreSQL leads to storage bloat and performance degradation.

Why Alibaba Cloud Log Service (SLS) Fits the Scenario

Extreme elasticity : Seconds‑level auto‑scaling without manual sharding.

High‑write throughput : Append‑only design handles tens of thousands of TPS with low I/O cost.

Resource isolation : Logs are physically separated from core business tables, protecting transaction stability.

Low‑cost long‑term storage : High compression and tiered storage dramatically reduce SSD expenses.

Native OLAP capabilities : Built‑in inverted index and columnar storage enable fast keyword search and real‑time SQL aggregation.

Technical Solution: Plugin Architecture and SLS Integration

The migration is split into two parts: (1) refactor Dify’s core to use a pluggable repository layer, and (2) implement a SLS‑backed repository plugin.

CORE_WORKFLOW_EXECUTION_REPOSITORY=extensions.logstore.repositories.logstore_workflow_execution_repository.LogstoreWorkflowExecutionRepository
CORE_WORKFLOW_NODE_EXECUTION_REPOSITORY=extensions.logstore.repositories.logstore_workflow_node_execution_repository.LogstoreWorkflowNodeExecutionRepository
API_WORKFLOW_NODE_EXECUTION_REPOSITORY=extensions.logstore.repositories.logstore_api_workflow_node_execution_repository.LogstoreAPIWorkflowNodeExecutionRepository
API_WORKFLOW_RUN_REPOSITORY=extensions.logstore.repositories.logstore_api_workflow_run_repository.LogstoreAPIWorkflowRunRepository

ALIYUN_SLS_ACCESS_KEY_ID=your_access_key_id
ALIYUN_SLS_ACCESS_KEY_SECRET=your_access_key_secret
ALIYUN_SLS_ENDPOINT=cn-hangzhou.log.aliyuncs.com
ALIYUN_SLS_REGION=cn-hangzhou
ALIYUN_SLS_PROJECT_NAME=your_project_name
ALIYUN_SLS_LOGSTORE_TTL=365

LOGSTORE_DUAL_WRITE_ENABLED=false
LOGSTORE_DUAL_READ_ENABLED=true

Versioned Log Writing and Reading Strategy

Each state transition writes a new record with a nanosecond‑level log_version field. Queries select the record with the maximum log_version per workflow_run_id to obtain the latest status.

Configuration and Migration Steps

Create an SLS project (the plugin auto‑creates logstores and indexes).

Obtain an AccessKey with read/write permissions.

Update Dify’s .env or docker‑compose.yaml with the repository and SLS credentials shown above.

Enable dual‑write/read flags for a gradual rollout.

Benefits After Migration

DB pressure reduced >95% : The two largest tables are offloaded, freeing SSD space and lowering CPU/connection usage.

Storage cost cut by ~10× : High‑compression, tiered storage in SLS makes long‑term log retention cheap.

Data value unlocked : Real‑time dashboards, funnel analysis, anomaly detection, and ETL pipelines become native to SLS, turning logs from ops‑only data into business insights.

Use Cases and Data Value Extraction

Example SQL for fast intent‑trend analysis:

* and title: 用户意图识别 and intent | select json_extract(outputs, '$.intent') as "用户意图", date_trunc('minute', __time__) t, count(1) as pv group by "用户意图", t order by t limit all

Other scenarios include funnel diagnostics, token‑usage alerts, and periodic ETL jobs that build training datasets directly from SLS.

Conclusion

Moving Dify’s workflow logs to Alibaba Cloud Log Service decouples high‑throughput, schema‑on‑read data from transactional PostgreSQL, eliminates the primary scalability bottleneck, reduces storage cost, and provides a rich analytics platform that turns operational logs into actionable business intelligence.

Architecture diagram
Architecture diagram
cloud nativePostgreSQLDifylog managementSLSscalingAlibaba Cloud Log Service
Alibaba Cloud Observability
Written by

Alibaba Cloud Observability

Driving continuous progress in observability technology!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.