How Alibaba Cloud Log Service Supercharges Dify’s Scaling and Cuts DB Costs
This article examines Dify’s production‑scale bottlenecks caused by heavy PostgreSQL logging, explains why a cloud‑native log service (SLS) better matches the append‑only, high‑throughput nature of workflow logs, and provides a step‑by‑step migration guide that dramatically reduces database pressure, storage cost, and unlocks advanced analytics.
Background and Scaling Challenges
Dify, a popular low‑code LLM application platform, faces severe database performance limits in large‑scale deployments because each chat request can trigger hundreds of PostgreSQL reads and writes, exhausting connection pools and causing slow queries.
Data Types and Current Bottlenecks
Meta data : tenant, app, workflow, tool configurations stored in PostgreSQL.
Runtime logs : execution details, session history, and message records that dominate DB I/O.
File data : user uploads and knowledge‑base documents kept in object storage.
Runtime logs alone occupy about 95% of DB storage and are the primary source of high‑frequency reads and writes.
Community Optimizations (Pre‑SLS)
In‑memory databases for lightweight scenarios.
Asynchronous log writes via Celery workers.
Periodic cleanup jobs that purge old logs.
Large‑field off‑loading to object storage.
These measures improve latency but hit a scalability ceiling because the underlying relational model still mismatches log characteristics.
Root Cause: Log Characteristics vs. Relational DB
Immutable final state : Logs become read‑only archives after execution.
Schemaless & rapidly changing JSON payloads : Frequent DDL changes cause lock contention.
High‑throughput sequential writes : Massive time‑series inserts exhaust connection pools.
Attempting to store append‑only, schema‑on‑read data in PostgreSQL leads to storage bloat and performance degradation.
Why Alibaba Cloud Log Service (SLS) Fits the Scenario
Extreme elasticity : Seconds‑level auto‑scaling without manual sharding.
High‑write throughput : Append‑only design handles tens of thousands of TPS with low I/O cost.
Resource isolation : Logs are physically separated from core business tables, protecting transaction stability.
Low‑cost long‑term storage : High compression and tiered storage dramatically reduce SSD expenses.
Native OLAP capabilities : Built‑in inverted index and columnar storage enable fast keyword search and real‑time SQL aggregation.
Technical Solution: Plugin Architecture and SLS Integration
The migration is split into two parts: (1) refactor Dify’s core to use a pluggable repository layer, and (2) implement a SLS‑backed repository plugin.
CORE_WORKFLOW_EXECUTION_REPOSITORY=extensions.logstore.repositories.logstore_workflow_execution_repository.LogstoreWorkflowExecutionRepository
CORE_WORKFLOW_NODE_EXECUTION_REPOSITORY=extensions.logstore.repositories.logstore_workflow_node_execution_repository.LogstoreWorkflowNodeExecutionRepository
API_WORKFLOW_NODE_EXECUTION_REPOSITORY=extensions.logstore.repositories.logstore_api_workflow_node_execution_repository.LogstoreAPIWorkflowNodeExecutionRepository
API_WORKFLOW_RUN_REPOSITORY=extensions.logstore.repositories.logstore_api_workflow_run_repository.LogstoreAPIWorkflowRunRepository
ALIYUN_SLS_ACCESS_KEY_ID=your_access_key_id
ALIYUN_SLS_ACCESS_KEY_SECRET=your_access_key_secret
ALIYUN_SLS_ENDPOINT=cn-hangzhou.log.aliyuncs.com
ALIYUN_SLS_REGION=cn-hangzhou
ALIYUN_SLS_PROJECT_NAME=your_project_name
ALIYUN_SLS_LOGSTORE_TTL=365
LOGSTORE_DUAL_WRITE_ENABLED=false
LOGSTORE_DUAL_READ_ENABLED=trueVersioned Log Writing and Reading Strategy
Each state transition writes a new record with a nanosecond‑level log_version field. Queries select the record with the maximum log_version per workflow_run_id to obtain the latest status.
Configuration and Migration Steps
Create an SLS project (the plugin auto‑creates logstores and indexes).
Obtain an AccessKey with read/write permissions.
Update Dify’s .env or docker‑compose.yaml with the repository and SLS credentials shown above.
Enable dual‑write/read flags for a gradual rollout.
Benefits After Migration
DB pressure reduced >95% : The two largest tables are offloaded, freeing SSD space and lowering CPU/connection usage.
Storage cost cut by ~10× : High‑compression, tiered storage in SLS makes long‑term log retention cheap.
Data value unlocked : Real‑time dashboards, funnel analysis, anomaly detection, and ETL pipelines become native to SLS, turning logs from ops‑only data into business insights.
Use Cases and Data Value Extraction
Example SQL for fast intent‑trend analysis:
* and title: 用户意图识别 and intent | select json_extract(outputs, '$.intent') as "用户意图", date_trunc('minute', __time__) t, count(1) as pv group by "用户意图", t order by t limit allOther scenarios include funnel diagnostics, token‑usage alerts, and periodic ETL jobs that build training datasets directly from SLS.
Conclusion
Moving Dify’s workflow logs to Alibaba Cloud Log Service decouples high‑throughput, schema‑on‑read data from transactional PostgreSQL, eliminates the primary scalability bottleneck, reduces storage cost, and provides a rich analytics platform that turns operational logs into actionable business intelligence.
Alibaba Cloud Observability
Driving continuous progress in observability technology!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
