Operations 28 min read

Achieving Full‑Stack Observability for Dify Agentic Apps with Alibaba Cloud Monitoring

This guide explains the observability challenges of Dify's low‑code LLM platform, analyzes its native and third‑party monitoring capabilities, and provides a step‑by‑step solution using Alibaba Cloud's non‑intrusive Python and Go probes, Trace Link integration, and detailed deployment instructions to monitor every component from the API to plugins and sandbox.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
Achieving Full‑Stack Observability for Dify Agentic Apps with Alibaba Cloud Monitoring

Dify is a popular low‑code LLM application platform that supports model integration, prompt orchestration, RAG, workflow/agent frameworks, and plugins, making Agentic app development convenient. Production‑grade Agentic apps involve dynamic elements such as session history, memory handling, tool calls, knowledge‑base retrieval, model generation, script execution, and workflow control, which introduce significant uncertainty. Observability is essential throughout the development, debugging, and operations lifecycle to connect Agentic app execution with upstream and downstream services.

Current Observability Landscape

Dify's built‑in application monitoring records execution details in the Dify database, offering high integration but limited analysis capabilities and performance issues at scale due to heavy DB writes.

Third‑party integrations (CloudMonitor, Langfuse, LangSmith) collect trace data from Workflow, Tool, and RAG nodes but suffer from coarse data granularity and lack of full‑chain linking.

Native OpenTelemetry (OTel) support provides tracing for Flask, HTTP, DB, Redis, and Celery, yet it does not instrument Dify's internal execution logic and cannot link with the Workflow execution details.

Full‑Stack Observability Solution

Alibaba Cloud Monitoring offers a combined solution: a non‑intrusive component probe (Python for API, Go for plugin engine, sandbox, and Nginx) plus Dify's official application‑level tracing. This approach covers all Dify components—API, Plugin‑Daemon, Sandbox, Worker, and Nginx—while requiring only environment‑variable configuration and minimal script changes.

1. Component Complexity

Dify requests traverse the gateway, execution engine, plugin engine, code sandbox, plugin runtime, and Celery task queue, creating a complex execution chain.

2. Rapid Iteration

Dify releases new versions weekly; probes must be resilient to frequent code changes.

3. Native Monitoring Gaps

Built‑in monitoring lacks detailed analysis; third‑party tracing lacks fine‑grained data; OTel lacks full‑chain linkage.

Probe Deployment Steps

Step 1 – Obtain CloudMonitor Endpoint & License Key

Log in to CloudMonitor 2.0, go to the Access Center, and click the Dify card.

Select the data reporting region and click “Get LicenseKey”.

Record the LicenseKey and Endpoint (do not include the port).

Step 2 – Install Python Probe

# Ensure pip is up‑to‑date
python -m ensurepip --upgrade
# Uninstall conflicting OTel packages
pip3 uninstall -y opentelemetry-instrumentation-celery \
  opentelemetry-instrumentation-flask \
  opentelemetry-instrumentation-redis \
  opentelemetry-instrumentation-requests \
  opentelemetry-instrumentation-logging \
  opentelemetry-instrumentation-wsgi \
  opentelemetry-instrumentation-fastapi \
  opentelemetry-instrumentation-asgi \
  opentelemetry-instrumentation-sqlalchemy
# Install Alibaba bootstrap and the probe
pip3 config set global.index-url https://mirrors.aliyun.com/pypi/simple/ && \
pip3 config set install.trusted-host mirrors.aliyun.com
pip3 install aliyun-bootstrap && aliyun-bootstrap -a install

Replace the original entrypoint script (e.g., entrypoint.sh) with a version that launches the application via aliyun-instrument:

# Use aliyun‑instrument to start the server
exec aliyun-instrument gunicorn \
  --bind "${DIFY_BIND_ADDRESS:-0.0.0.0}:${DIFY_PORT:-5001}" \
  --workers ${SERVER_WORKER_AMOUNT:-1} \
  --worker-class ${SERVER_WORKER_CLASS:-gevent} \
  --worker-connections ${SERVER_WORKER_CONNECTIONS:-10} \
  --timeout ${GUNICON_TIMEOUT:-200} \
  app:app

Step 3 – Configure Environment Variables

Set the following variables (example values shown): ENABLE_OTEL=true – enable OTel reporting.

OTLP_TRACE_ENDPOINT=http://tracing-cn-heyuan.arms.aliyuncs.com/{token}/api/otlp/traces
OTLP_METRIC_ENDPOINT=http://tracing-cn-heyuan.arms.aliyuncs.com/{token}/api/otlp/metrics
OTEL_SAMPLING_RATE=1
APPLICATION_NAME=dify-api

(or dify-worker for the worker).

Step 4 – Deploy and Verify

After redeploying Dify with the modified script and environment variables, trigger a few API calls. Within 1–2 minutes, the trace data appears in the CloudMonitor console under “AI Application Observability”.

Monitoring Other Components

Plugin‑Daemon

Rebuild the dify-plugin-daemon Docker image with InstGo to embed the Go probe. Example Dockerfile snippet:

FROM golang:1.23-alpine AS builder
ARG VERSION=unknown
COPY . /app
WORKDIR /app
RUN wget "http://arms-apm-cn-hangzhou.oss-cn-hangzhou.aliyuncs.com/instgo/instgo-linux-amd64" -O instgo && chmod 777 instgo
RUN INSTGO_EXTRA_RULES="dify_python" ./instgo go build -o internal/core/runner/python/python.so -buildmode=c-shared cmd/lib/python/main.go && \
    INSTGO_EXTRA_RULES="dify_python" ./instgo go build -o internal/core/runner/nodejs/nodejs.so -buildmode=c-shared cmd/lib/nodejs/main.go && \
    ./instgo go build -o main cmd/server/main.go
COPY entrypoint.sh /app/entrypoint.sh && chmod +x /app/entrypoint.sh
FROM ubuntu:24.04
WORKDIR /app
COPY --from=builder /app/main /app/entrypoint.sh /app/
CMD ["/bin/bash","-c","/app/entrypoint.sh"]

After rebuilding, add the following labels to the Kubernetes pod spec to enable automatic ARMS collection:

labels:
  aliyun.com/app-language: golang
  armsPilotAutoEnable: 'on'
  armsPilotCreateAppName: "dify-daemon-plugin"

Sandbox

Similarly, modify the sandbox build scripts ( build/build_amd64.sh or build/build_arm64.sh) to download instgo and inject the probe into the Python and Node.js shared libraries before building the final binary.

Worker (Celery)

Set OTel environment variables (as above) for the worker container; Dify version ≥ 1.7.0 is required.

Nginx Gateway

Use the official nginx:*-otel image, load the ngx_otel_module, and configure the exporter:

load_module modules/ngx_otel_module.so;
http {
    otel_exporter {
        endpoint "${GRPC_ENDPOINT}";
        header Authentication "${GRPC_TOKEN}";
    }
    otel_trace on;
    otel_service_name "${SERVICE_NAME}";
    otel_trace_context propagate;
    ...
}

Trace Linking (Trace Link)

Dify's official tracing reports a traceId for each Workflow execution. The non‑intrusive probes embed the same traceId in downstream spans, allowing the CloudMonitor UI to display a “Links” tab where users can jump between the LLM‑level trace and the infrastructure‑level trace, achieving full‑chain visibility.

Practical Use Cases

Analyzing execution‑engine exceptions by locating error spans via the Links panel and inspecting stack traces.

Identifying slow plugin calls: the trace shows the longest‑duration span ( dify_plugin_execute) inside the plugin runtime, enabling pinpointing of bottlenecks.

Filtering out database/Redis spans to focus on custom logic when debugging.

References

LicenseKey API: https://help.aliyun.com/zh/arms/application-monitoring/developer-reference/api-arms-2019-08-08-describetracelicensekey-apps

Supported Go components: https://help.aliyun.com/zh/arms/application-monitoring/developer-reference/go-components-and-frameworks-supported-by-arms-application-monitoring

Dify official monitoring guide: https://help.aliyun.com/zh/arms/tracing-analysis/untitled-document-1750672984680

Integrate Dify with Alibaba Cloud: https://docs.dify.ai/zh-hans/guides/monitoring/integrate-external-ops-tools/integrate-aliyun

Python probe integration: https://help.aliyun.com/zh/cms/cloudmonitor-2-0/user-guide/monitor-dify-applications

OpenTelemetry Nginx tracing: https://help.aliyun.com/zh/opentelemetry/user-guide/use-opentelemetry-to-perform-tracing-analysis-on-nginx

observabilityOpenTelemetryDifyAlibaba Cloudnon‑intrusive probes
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.