How to Build End‑to‑End Observability for Large‑Model Applications on Alibaba Cloud
This guide explains how to design and implement a complete observability solution for large‑model AI services on Alibaba Cloud, covering architecture, core metrics, logging standards, demo code, log collection, dashboard design, alerting, monitoring tools, troubleshooting SOPs, and recovery procedures.
Background
Large‑model technology has rapidly matured and is being deployed across industries. As these models are integrated into applications, establishing an end‑to‑end observability system becomes increasingly critical.
Overall Framework
The observability solution for large‑model applications extends traditional monitoring by incorporating model‑specific characteristics. The overall framework follows Alibaba's 1‑5‑10 observability methodology.
Core Metrics
Observability metrics are divided into three categories: availability, performance, and business feedback. Key indicators include resource water level (QPM and token usage), analysis dimensions (application, module, model, workspace), and user negative feedback rate.
Resource water level: Monitor QPM and token usage; Bailian currently does not support alerts on these, so users must log and aggregate them manually.
Analysis dimensions (application, module): Log these fields to differentiate issues across multiple apps or modules sharing the same cloud account.
Analysis dimensions (model, workspace): Bailian supports model‑plus‑workspace level flow‑control.
User negative feedback rate: Similar to e‑commerce metrics, this reflects user experience issues.
Findings (1)
Observability for large models currently focuses on monitoring and alerting, split into business monitoring and cloud‑product monitoring.
Business Monitoring
Log custom information during model calls (prompt, model name, etc.) and use Log Service, QuickBI, DataV, or CloudMonitor to build dashboards and custom alerts.
Cloud‑Product Monitoring
Leverage Bailian's model and application observability modules together with CloudMonitor and ARMS to provide standard monitoring capabilities.
Logging Standards
Log Printing
Log Printing Specification
Key fields to log before and after model invocation include call time, prompt, model, workspace, request ID, status code, duration, input/output tokens, error code, and error message.
<property name="LOG_PATTERN" value="%d{yyyy-MM-dd HH:mm:ss.SSS} %-5p [traceId:%X{traceId}] [%c{3}#%method():%L] [%thread] - %m%n" />Example Java code:
long t1 = System.currentTimeMillis();
try {
Generation gen = new Generation();
Message systemMsg = Message.builder()
.role(Role.SYSTEM.getValue())
.content("You are a helpful assistant.")
.build();
Message userMsg = Message.builder()
.role(Role.USER.getValue())
.content(prompt)
.build();
GenerationParam param = GenerationParam.builder()
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model(model)
.workspace(workspace)
.messages(Arrays.asList(systemMsg, userMsg))
.resultFormat(GenerationParam.ResultFormat.MESSAGE)
.build();
logger.debug("prompt:{}", prompt);
GenerationResult result = gen.call(param);
long t2 = System.currentTimeMillis();
logger.info("{},{},{},{},{},{},{},{},{},{},{},{}",
app, module, model, workspace, result.getRequestId(), 200,
t2 - t1, result.getUsage().getInputTokens(),
result.getUsage().getOutputTokens(), result.getUsage().getTotalTokens(), "", "");
logger.debug("result:{}", JSON.toJSONString(result));
return result.getOutput().getChoices().get(0).getMessage().getContent();
} catch (ApiException e) {
long t2 = System.currentTimeMillis();
logger.error("{},{},{},{},{},{},{},{},{},{},{},{}",
app, module, model, workspace, e.getStatus().getRequestId(),
e.getStatus().getStatusCode(), t2 - t1, 0, 0, 0,
e.getStatus().getCode(), e.getStatus().getMessage());
throw new RuntimeException(e.getLocalizedMessage());
} catch (InputRequiredException e) {
long t2 = System.currentTimeMillis();
logger.error("{},{},{},{},{},{},{},{},{},{},{},{}",
app, module, model, workspace, "", "", t2 - t1, 0, 0, 0,
"InputRequired", e.getLocalizedMessage());
throw new RuntimeException(e.getLocalizedMessage());
} catch (NoApiKeyException e) {
long t2 = System.currentTimeMillis();
logger.error("{},{},{},{},{},{},{},{},{},{},{},{}",
app, module, model, workspace, "", "", t2 - t1, 0, 0, 0,
"NoApiKey", e.getLocalizedMessage());
throw new RuntimeException(e.getLocalizedMessage());
}Resulting log entries:
2025-07-09 14:23:24.748 INFO [traceId:41ea65e3-0b4c-4b65-8567-325a6128eddf] [c.a.c.s.i.ChatServiceImpl#chat():67] [http-nio-8080-exec-1] - bailian-log,Chat,qwen-plus,llm-6qtsrlugmu6wanfs,6c958db4-d489-9de8-b5d4-9fc872f24ce2,200,1141,24,9,33,, 2025-07-09 08:26:33.833 ERROR [traceId:63384a39-bbfe-489b-a290-6635975fc633] [c.a.c.s.i.ChatServiceImpl#chat():72] [http-nio-8080-exec-3] - bailian-log,Chat,qwen-plus,llm-6qtsrlugmu6wanfs,83a0cb5d-07f9-9fd2-9985-1c05b339f5d5,401,38,0,0,0,InvalidApiKey,Invalid API-key provided.Log Collection & Storage
Use Alibaba Cloud Log Service (SLS) for log ingestion. Logs can be uploaded via Logtail parsing or SDK/Log4j/Logback plugins. Create a Logstore to store logs; the demo uses auto‑generated indexes.
Observability Dashboard
The dashboard addresses two main questions: (1) What is the current usage and are there any issues? (2) If problems exist, where do they originate?
Key Dashboard Indicators
Business Calls : Total calls, error distribution, latency.
Resource Water Level : QPM and token usage, with overall, core‑model, and core‑model‑workspace views.
Business Feedback : User negative‑feedback rate and trend analysis.
Examples of visualizations (call volume, error codes, latency, model‑level drill‑downs) are shown in the embedded images.
Alerting
Configure alerts directly from the Log Service dashboard. Recommended alert metrics include failure count / success rate, response time, and resource water level (QPM / token usage). Alerts can be scoped to specific models, applications, or workspaces.
Monitoring Solutions
Bailian : Provides standard monitoring and alerting for model and application observability, including call counts, failures, average latency, and model‑level statistics.
CloudMonitor : Offers model‑centric metrics such as call volume, failure count, and average latency.
ARMS : Application‑level real‑time monitoring, tracing API call chains, success rates, and detailed request information.
All solutions are continuously evolving; stay updated with new cloud product features.
Troubleshooting SOP
Typical steps: (1) Gather incident details from alerts or user reports. (2) Identify incident type (resource water‑level, error code, latency, etc.). (3) Use logs (TraceID) and core metrics to narrow scope. (4) Drill down by application, model, or workspace. (5) Launch appropriate emergency response. (6) Verify recovery.
Recovery
For rate‑limit incidents, possible actions include: (1) Apply client‑side throttling and retries. (2) Increase workspace quota in Bailian. (3) Submit an expansion request to Alibaba Cloud with UID, model name, and business impact details.
Diagnostic Tools
Combine Bailian, CloudMonitor, ARMS, and custom dashboards with user logs for comprehensive diagnosis. If issues cannot be resolved by the user, Alibaba Cloud support can provide additional backend logs and tools.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
