Master Apache SkyWalking: Setup, Performance Comparison, and Advanced Tracing
This comprehensive guide introduces distributed tracing challenges in large microservice systems, explains what Apache SkyWalking is, compares it with Zipkin, Pinpoint and CAT, details performance test results, walks through installation, configuration, custom tracing, log integration, alerting, and high‑availability deployment.
Link Tracing Introduction
In large microservice architectures consisting of dozens or hundreds of services, common problems include:
How to connect the entire call chain and quickly locate issues? How to clarify dependencies between microservices? How to analyze performance of each microservice interface? How to trace the processing order of the whole business flow?
What is SkyWalking
1.1 SkyWalking Introduction
SkyWalking supports multiple languages and frameworks, including Java, Go, Node.js, and Python. It uses distributed tracing technology to monitor all internal and external calls, providing complete performance insights.
SkyWalking offers powerful features such as performance monitoring, fault diagnosis, debugging, data analysis, and alerting. It can integrate with Grafana and Elasticsearch for stronger monitoring and analysis capabilities.
In short, Apache SkyWalking is a powerful and easy‑to‑use APM tool that helps developers and DevOps teams understand application behavior and act promptly on performance problems.
Official website: https://skywalking.apache.org/ Download: https://skywalking.apache.org/downloads/ Github: https://github.com/apache/skywalking Documentation: https://skywalking.apache.org/docs/main/v8.5.0/readme/ Chinese documentation: https://skyapm.github.io/document-cn-translation-of-skywalking/
1.2 Comparison of Link Tracing Frameworks
Zipkin is Twitter's open‑source tracing tool, lightweight and easy to deploy.
Pinpoint is a Korean open‑source tracing and monitoring tool based on bytecode injection, supporting many plugins and a powerful UI, with no code intrusion on the client side.
SkyWalking is a domestic open‑source tracing and monitoring tool based on bytecode injection, supporting many plugins, a strong UI, and no code intrusion. It is now an Apache incubating project.
CAT is Dianping's open‑source platform covering tracing, monitoring, log collection, and alerting.
1.3 Performance Comparison
Simulated three concurrency levels (500, 750, 1000) using JMeter; each thread sends 30 requests with a 10 ms think time. Sampling rate is 1 (100%). Pinpoint default sampling is 20 % (changed to 100%). Zipkin default is 100%. Combined, there are 12 test scenarios.
Results show that among the three tracing components, SkyWalking's probe has the smallest impact on throughput, Zipkin is moderate, and Pinpoint significantly reduces throughput (e.g., at 500 concurrent users, service throughput drops from 1385 to 774). CPU and memory impact stays within about 10 %.
1.4 Main Features of SkyWalking
1. Multiple monitoring methods via language probes and service mesh.
2. Supports automatic probes for Java, .NET Core, and Node.js.
3. Lightweight and efficient; no need for big‑data platforms or many servers.
4. Modular design – UI, storage, and cluster management have multiple selectable mechanisms.
5. Alerting support.
6. Excellent visualization solutions.
2. SkyWalking Environment Setup and Deployment
SkyWalking consists of four main components:
SkyWalking agent – binds with the business system to collect monitoring data.
SkyWalking OAP service – processes and stores data, provides APIs for the UI.
SkyWalking webapp – the front‑end UI for displaying data.
Database (MySQL, Elasticsearch, etc.) – stores the monitoring data.
2.1 Download SkyWalking
Download: http://skywalking.apache.org/downloads/
Directory structure:
2.2 Deploy SkyWalking OAP Service
Start script:
bin/startup.shLog files are stored in the
logsdirectory.
After successful startup, two services run:
skywalking-oap-server(ports 11800 for data collection and 12800 for UI requests) and
skywalking-webapp(default port 8080). Ports can be changed in
config/application.yml.
Webapp port configuration (default 8080) can be modified in
webapp/webapp.yml.
2.3 Three Core Concepts
Service: a set of workloads providing the same behavior; the name can be defined when using the agent.
Service Instance: each individual workload (a real process) within a service.
Endpoint: the request path of a specific service, such as an HTTP URI or a gRPC class‑method signature.
3. SkyWalking Integration with Microservices
3.1 Linux – Jar Deployment
Prepare a Spring Boot executable jar and start it with the SkyWalking agent via the
-javaagentparameter.
<code>#!/bin/sh
# SkyWalking Agent configuration
export SW_AGENT_NAME=springboot-skywalking-demo
export SW_AGENT_COLLECTOR_BACKEND_SERVICES=127.0.0.1:11800
export SW_AGENT_SPAN_LIMIT=2000
export JAVA_AGENT=-javaagent:/usr/local/soft/apache-skywalking-apm-bin-es7/agent/skywalking-agent.jar
java $JAVA_AGENT -jar springboot-skywalking-demo-0.0.1-SNAPSHOT.jar
</code>Equivalent command:
<code>java -javaagent:/usr/local/soft/apache-skywalking-apm-bin-es7/agent/skywalking-agent.jar \
-DSW_AGENT_COLLECTOR_BACKEND_SERVICES=127.0.0.1:11800 \
-DSW_AGENT_NAME=springboot-skywalking-demo -jar springboot-skywalking-demo-0.0.1-SNAPSHOT.jar
</code>These parameters correspond to properties in
agent/config/agent.config:
<code># The service name in UI
agent.service_name=${SW_AGENT_NAME:Your_ApplicationName}
# Backend service addresses.
collector.backend_service=${SW_AGENT_COLLECTOR_BACKEND_SERVICES:127.0.0.1:11800}
</code>3.2 Windows – IDEA
Configure JVM parameters in the IDE as shown:
Use
-DSW_AGENT_COLLECTOR_BACKEND_SERVICESto specify the remote collector address; the
-javaagentmust point to the local path of
skywalking-agent.jar.
3.3 Tracing Across Multiple Microservices
To trace across multiple services, add the
-javaagentparameter to each microservice (e.g.,
mall-gateway,
mall-order,
mall-user) and test via
http://localhost:8888/user/findOrderByUserId/1.
3.4 Copy Gateway Plugin
Copy the gateway plugin from
agent/optional-pluginsto
agent/plugins:
4. SkyWalking Persistence of Trace Data
By default SkyWalking uses an H2 database (configured in
config/application.yml).
4.1 MySQL Persistence
Modify
config/application.ymlto use MySQL as the storage backend and update the connection settings. Add the MySQL driver JAR to
oap-libsbecause it is not included by default.
<code>storage:
selector: ${SW_STORAGE:mysql}
mysql:
properties:
jdbcUrl: ${SW_JDBC_URL:"jdbc:mysql://localhost:3306/swtest"}
dataSource.user: ${SW_DATA_SOURCE_USER:root}
dataSource.password: ${SW_DATA_SOURCE_PASSWORD:root}
</code>After starting SkyWalking, tables are created in the
swtestdatabase.
5. Custom SkyWalking Tracing
Add the tracing toolkit dependency:
<code><dependency>
<groupId>org.apache.skywalking</groupId>
<artifactId>apm-toolkit-trace</artifactId>
<version>8.4.0</version>
</dependency>
</code>5.1 @Trace Annotation
Annotate business methods with
@Traceto make them appear in the UI trace view.
5.2 @Tag / @Tags
Use
@Tagor
@Tagsto add extra information such as parameters and return values.
<code>@Trace
@Tag(key = "list", value = "returnedObj")
public List<User> list(){
return userMapper.list();
}
@Trace
@Tags({@Tag(key = "param", value = "arg[0]"),
@Tag(key = "user", value = "returnedObj")})
public User getById(Integer id){
return userMapper.getById(id);
}
</code>6. SkyWalking Log Integration
Add the logback toolkit dependency:
<code><dependency>
<groupId>org.apache.skywalking</groupId>
<artifactId>apm-toolkit-logback-1.x</artifactId>
<version>8.5.0</version>
</dependency>
</code>Configure
logback-spring.xmlto include the
%tidplaceholder:
<code><configuration>
<include resource="org/springframework/boot/logging/logback/defaults.xml"/>
<appender name="console" class="ch.qos.logback.core.ConsoleAppender">
<encoder class="ch.qos.logback.core.encoder.LayoutWrappingEncoder">
<layout class="org.apache.skywalking.apm.toolkit.log.logback.v1.x.TraceIdPatternLogbackLayout">
<Pattern>${CONSOLE_LOG_PATTERN}</Pattern>
</layout>
</encoder>
</appender>
<root level="INFO">
<appender-ref ref="console"/>
</root>
</configuration>
</code>Enable gRPC log reporting (available from v8.4.0):
<code><appender name="grpc-log" class="org.apache.skywalking.apm.toolkit.log.logback.v1.x.log.GRPCLogClientAppender">
<encoder class="ch.qos.logback.core.encoder.LayoutWrappingEncoder">
<layout class="org.apache.skywalking.apm.toolkit.log.logback.v1.x.mdc.TraceIdMDCPatternLogbackLayout">
<Pattern>%d{yyyy-MM-dd HH:mm:ss.SSS} [%X{tid}] [%thread] %-5level %logger{36} -%msg%n</Pattern>
</layout>
</encoder>
</appender>
<root level="info">
<appender-ref ref="grpc-log"/>
</root>
</code>Agent configuration for gRPC log reporting:
<code>plugin.toolkit.log.grpc.reporter.server_host=${SW_GRPC_LOG_SERVER_HOST:192.168.3.100}
plugin.toolkit.log.grpc.reporter.server_port=${SW_GRPC_LOG_SERVER_PORT:11800}
plugin.toolkit.log.grpc.reporter.max_message_size=${SW_GRPC_LOG_MAX_MESSAGE_SIZE:10485760}
plugin.toolkit.log.grpc.reporter.upstream_timeout=${SW_GRPC_LOG_GRPC_UPSTREAM_TIMEOUT:30}
</code>7. SkyWalking Alerting
Alert rules are defined in
config/alarm-settings.yml. Example rules include:
Service average response time > 1 s in the last 3 minutes.
Service success rate < 80 % in the last 2 minutes.
Percentage of requests with response time > 1 s in the last 3 minutes.
Instance average response time > 1 s with name matching a regex in the last 2 minutes.
Endpoint average response time > 1 s in the last 2 minutes.
Database access average response time > 1 s in the last 2 minutes.
Each rule contains fields such as rule name, metric name, include/exclude names, threshold, operator, period, count, silence period, and message.
Webhook
When an alarm triggers, SkyWalking sends a POST request with JSON payload to the configured webhook URL.
<code>[
{
"scopeId": 1,
"scope": "SERVICE",
"name": "serviceA",
"id0": "12",
"id1": "",
"ruleName": "service_resp_time_rule",
"alarmMessage": "alarmMessage xxxx",
"startTime": 1560524171000
},
{
"scopeId": 1,
"scope": "SERVICE",
"name": "serviceB",
"id0": "23",
"id1": "",
"ruleName": "service_resp_time_rule",
"alarmMessage": "alarmMessage yyy",
"startTime": 1560524171000
}
]
</code>Fields explained: scope, name, ids, ruleName, alarmMessage, startTime.
Email Alert Implementation
Add
spring-boot-starter-maildependency and configure SMTP settings.
<code><dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-mail</artifactId>
</dependency>
</code> <code>server:
port: 9134
spring:
mail:
host: smtp.163.com
username: [email protected]
password: your_email_service_key
default-encoding: utf-8
port: 465
protocol: smtp
properties:
mail:
debug: false
smtp:
socketFactory:
class: javax.net.ssl.SSLSocketFactory
</code>Define a DTO and a controller to receive alarms and send email:
<code>@RestController
@RequestMapping("/alarm")
public class SwAlarmController {
private final JavaMailSender sender;
@Value("${spring.mail.username}")
private String from;
@PostMapping("/receive")
public void receive(@RequestBody List<SwAlarmDTO> alarmList) {
SimpleMailMessage message = new SimpleMailMessage();
message.setFrom(from);
message.setTo(from);
message.setSubject("Alarm Email");
message.setText(getContent(alarmList));
sender.send(message);
}
private String getContent(List<SwAlarmDTO> alarmList) {
StringBuilder sb = new StringBuilder();
for (SwAlarmDTO dto : alarmList) {
sb.append("scopeId: ").append(dto.getScopeId())
.append("\nscope: ").append(dto.getScope())
.append("\nName: ").append(dto.getName())
.append("\nID: ").append(dto.getId0())
.append("\nRule: ").append(dto.getRuleName())
.append("\nMessage: ").append(dto.getAlarmMessage())
.append("\nTime: ").append(dto.getStartTime())
.append("\n\n----------\n\n");
}
return sb.toString();
}
}
</code>Add the webhook URL to
config/alarm-settings.yml:
<code>webhooks:
- http://127.0.0.1:9134/alarm/receive
</code>Test by adding a 2‑second sleep in a service method, invoking the endpoint, and confirming that an email is received.
8. SkyWalking High Availability
In production, the backend should support high throughput and high availability. Deploy a SkyWalking OAP cluster registered with Nacos; as long as at least one OAP instance is running, tracing continues.
Requirements:
At least one Nacos instance (or Nacos cluster).
At least one Elasticsearch or MySQL instance (or a cluster).
At least two SkyWalking OAP services.
At least one UI service (UI can be clustered behind Nginx).
Configure
config/application.ymlto use Nacos as the registry:
<code>registry:
type: nacos
nacos:
serverLists: 127.0.0.1:8848
namespace: skywalking
group: SKY
username: nacos
password: nacos
</code>Set the storage selector to Elasticsearch 7:
<code>storage:
selector: ${SW_STORAGE:elasticsearch7}
elasticsearch7:
nameSpace: skywalking
clusterNodes: 127.0.0.1:9200
</code>Configure UI
webapp.ymlwith a list of OAP servers:
<code>collector:
ribbon:
listOfServers: 192.168.3.10:11800,192.168.3.12:11800
</code>Start services with the JVM parameter pointing to both OAP backends:
<code>-DSW_AGENT_COLLECTOR_BACKEND_SERVICES=192.168.3.10:11800,192.168.3.12:11800
</code>Software Development Quality
Discussions on software development quality, R&D efficiency, high availability, technical quality, quality systems, assurance, architecture design, tool platforms, test development, continuous delivery, continuous testing, etc. Contact me with any article questions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.