Enable Python Probe for LLM Observability on Alibaba Cloud ACK
This guide explains how to integrate Alibaba Cloud's Python probe into a Kubernetes (ACK) environment to monitor large language model (LLM) applications, covering prerequisites, installation steps, Dockerfile modifications, resource permissions, and sample Python code for both server and client components.
Background
As large language model (LLM) technology matures, many enterprises embed LLMs into their products. However, the opaque internal mechanisms of LLMs pose risks and hinder deployment. Observability of LLMs provides essential data for explainability, bias detection, risk mitigation, and performance improvement.
Application Example
A company adds an intelligent Q&A feature to its product. The architecture consists of a client sending a request to a server, which calls a chatbot that performs Retrieval‑Augmented Generation (RAG) before responding. To observe this LLM workflow, the company integrates Alibaba Cloud's Python probe.
Prerequisites
Create a Kubernetes (ACK) cluster – you may choose a dedicated, managed, or serverless cluster.
Create a namespace (e.g., arms-demo) following the namespace management guide.
Verify your Python version and framework meet the probe compatibility requirements.
Step 1: Install ARMS Application Monitoring Component
Log in to the Container Service console.
Select the target cluster from the cluster list.
Navigate to Operations > Component Management and search for ack-onepilot (version ≥ 3.2.4).
Click Install on the ack-onepilot card.
Accept the default parameters and confirm the installation.
Step 2: Modify Dockerfile
Install the probe and run the instrumented command.
pip3 install aliyun-bootstrap aliyun-bootstrap -a install aliyun-instrument python app.pyExample Dockerfile (before):
# Use Python 3.10 base image
FROM docker.m.daocloud.io/python:3.10
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY ./app.py /app/app.py
EXPOSE 8000
CMD ["python","app.py"]Example Dockerfile (after adding the probe):
# Use official Python 3.10 base image
FROM docker.m.daocloud.io/python:3.10
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Install aliyun python probe
RUN pip3 install aliyun-bootstrap && aliyun-bootstrap -a install
COPY ./app.py /app/app.py
EXPOSE 8000
CMD ["aliyun-instrument","python","app.py"]Step 3: Grant ARMS Resource Access
If your cluster lacks an ARMS Addon Token, manually grant ARMS permissions:
Log in to the Container Service console and open the target cluster.
Go to Cluster > Resources > Worker RAM Role and add a new authorization.
Select the AliyunARMSFullAccess permission (and AliyunSTSAssumeRoleAccess for dedicated clusters) and confirm.
After installing ack-onepilot, fill in the AccessKey and AccessKeySecret of an account with ARMS permissions in the component’s configuration page.
Step 4: Enable ARMS Monitoring for Python Application
Add the following labels to the pod template in your Deployment YAML:
labels:
aliyun.com/app-language: python # Required for Python apps
armsPilotAutoEnable: 'on'
armsPilotCreateAppName: "<your-deployment-name>"Deploy the updated resources and restart the pods.
Monitoring Results
After the container redeploys (1–2 minutes), view metrics in the ARMS console under Application Monitoring > Application List . The console provides call‑chain analysis, trace views for micro‑service and LLM scenarios, and customizable alerts.
Compatibility
The probe supports Python >= 3.8 and works with common frameworks such as FastAPI.
Sample Code
arms-python-server
import uvicorn
from fastapi import FastAPI, HTTPException
from logging import getLogger
from concurrent import futures
from opentelemetry import trace
tracer = trace.get_tracer(__name__)
_logger = getLogger(__name__)
import requests, os
def call_requests():
url = 'https://www.aliyun.com'
call_url = os.getenv('CALL_URL') or url
response = requests.get(call_url)
response.raise_for_status()
print(f"response code: {response.status_code} - {response.text}")
app = FastAPI()
def call_client():
_logger.warning("calling client")
url = 'https://www.aliyun.com'
call_url = os.getenv('CLIENT_URL') or url
response = requests.get(call_url)
return response.text
@app.get("/")
async def call():
with tracer.start_as_current_span("parent") as rootSpan:
rootSpan.set_attribute("parent.value", "parent")
with futures.ThreadPoolExecutor(max_workers=2) as executor:
with tracer.start_as_current_span("ThreadPoolExecutorTest") as span:
span.set_attribute("future.value", "ThreadPoolExecutorTest")
future = executor.submit(call_client)
future.result()
return {"data": "call"}
if __name__ == "__main__":
uvicorn.run(app, host="0.0.0.0", port=8000)arms-python-client
from fastapi import FastAPI
from langchain.llms.fake import FakeListLLM
import uvicorn
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
app = FastAPI()
llm = FakeListLLM(responses=["I'll callback later.", "You 'console' them!"])
template = """Question: {question}
Answer: Let's think step by step."""
prompt = PromptTemplate(template=template, input_variables=["question"])
llm_chain = LLMChain(prompt=prompt, llm=llm)
question = "What NFL team won the Super Bowl in the year Justin Bieber was born?"
@app.get("/")
def call_langchain():
res = llm_chain.run(question)
return {"data": res}
if __name__ == "__main__":
uvicorn.run(app, host="0.0.0.0", port=8000)Related Links
https://help.aliyun.com/zh/arms/application-monitoring/user-guide/start-monitoring-python-applications/
https://help.aliyun.com/zh/arms/application-monitoring/developer-reference/python-probe-compatibility-requirements
https://help.aliyun.com/zh/arms/application-monitoring/user-guide/create-and-manage-alert-rules-in-application-monitoring-new/
Alibaba Cloud Observability
Driving continuous progress in observability technology!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
