Operations 12 min read

How Synthetic Monitoring Boosts Network Reliability and User Experience

This article explains the importance of network stability, outlines major real‑world outages, and introduces synthetic monitoring—its functions, advantages, disadvantages, and various types such as protocol, browser, and internal monitoring—while comparing probe point categories and guiding enterprises on selecting the right strategy to improve service reliability and performance.

Alibaba Cloud Observability
Alibaba Cloud Observability
Alibaba Cloud Observability
How Synthetic Monitoring Boosts Network Reliability and User Experience

With the rapid development of the Internet, network and service stability has become an essential part of social order. When networks or services fail, the consequences affect society, enterprises, and users, causing immeasurable losses.

December 2020: Akamai DNS outage prevented many US companies, including banks and airlines, from accessing their websites.

June 2021: Fastly CDN failure impacted The New York Times, Amazon, Twitch, and Reddit.

October 2021: Facebook experienced a large‑scale outage, taking down Facebook, Instagram, and WhatsApp globally for six hours.

December 2022: Alibaba Cloud Hong Kong zone C service interruption lasted over twelve hours, severely affecting many enterprises.

In this context, Synthetic Monitoring (also called Synthetic Testing) is a core observability function for network performance and user‑experience monitoring. By using globally distributed probe nodes to simulate user requests to target services/domains/IPs, it monitors availability, performance, and experience across regions and ISPs. Synthetic monitoring accelerates fault detection and helps optimize network resources, improving overall business efficiency and user experience.

All screenshots and features described below are from Alibaba Cloud’s CloudMonitor Synthetic Monitoring product.

Network Synthetic Monitoring vs. Real‑User Monitoring (RUM)

Advantages

Non‑intrusive deployment: no SDK integration required on the front end.

Proactive fault detection: tests run before users encounter issues, allowing early identification of potential problems.

Pre‑release global verification: enables comprehensive network compatibility and stability testing before product launch or new region deployment.

Disadvantages

Simulation limitation: RUM captures real user data, while synthetic tests simulate requests and may not fully reflect actual user experience.

Coverage limitation: cannot capture all possible user behaviors or complex interaction scenarios, and cannot analyze issues at the single‑user level.

Categories and Typical Scenarios of Synthetic Monitoring

Protocol Monitoring (Availability Monitoring)

Protocol monitoring uses various network‑layer protocols (DNS, HTTP, TCP, UDP, PING, MTR, WebSocket, etc.) to simulate user behavior and analyze service performance, ensuring service stability and optimizing user experience through latency, packet loss, and other metrics.

1) Availability Monitoring: Periodically checks website accessibility from multiple global cities and ISP nodes, providing early warnings for business continuity risks. Supports custom assertions on response time, status code, headers, body content, and certificate expiration.

2) Network Quality Monitoring: Measures network conditions between regions and ISPs, aiding decisions on link optimization, CDN speed testing, overseas network architecture, and game‑related ISP performance analysis.

3) DNS Hijack Monitoring: Ensures critical domains resolve correctly and detects DNS configuration errors or ISP hijacking that could cause service interruptions. Important for regulated industries where regional DNS policies differ.

4) Competitor Analysis: Compares page access performance and experience of peer websites to improve service competitiveness.

Browser Monitoring

Uses real browsers (Chrome, Firefox, Edge, Safari) on globally distributed nodes to open target pages, accurately reflecting page load times, rendering efficiency of each element, and supporting user‑journey recording and playback. Screenshots help pinpoint rendering issues.

1) User Experience Analysis: Tracks time to first paint, main content rendering, and full interaction, providing data to improve visual feedback speed.

2) Page Element Optimization: Waterfall charts reveal factors slowing document load, helping developers locate bottlenecks.

3) Page Injection Detection: Detects unauthorized third‑party content injection (e.g., CDN or JS poisoning) to protect against malicious tampering.

4) User Behavior Integrity Verification: Records complete user journeys (login, browse, search, purchase, etc.) and replays them during testing to validate complex business flows.

Internal Monitoring

Focuses on the health of services and instances within a cloud VPC, helping maintain availability and performance of internal cloud services.

1) VPC Connectivity Monitoring: Continuously monitors network connectivity between instances inside a VPC, quickly identifying and fixing connection issues.

2) Cloud Service Inspection: Audits cloud‑hosted internal services and product instances (e.g., RDS, Redis) to ensure they operate normally and meet expected standards.

Probe Point Types

Probe nodes are classified into four main types; enterprises should choose based on business needs, cost, and desired coverage.

Cloud Host Probe Points: Nodes on cloud providers such as Alibaba Cloud, Microsoft, Google, or Amazon with multi‑line BGP exits. Results are stable with low noise, suitable for overall service availability monitoring.

ISP IDC Probe Points: Physical devices located in telecom carrier data centers with single‑line ISP exits. Provide stable results and help understand service performance across different carriers.

PC Last‑mile Users: Nodes deployed on home PCs, offering better simulation of end‑user experience despite higher noise and cost.

Mobile Users: Nodes on smartphones reflecting the growing mobile traffic; results have higher variance but are crucial for mobile app performance analysis.

Comparing availability and performance across different probe points reveals trade‑offs in stability, noise, and realism.

Conclusion

Network and service stability is crucial in today’s society. By leveraging Synthetic Monitoring, enterprises can proactively monitor and optimize service availability, performance, and user experience. Each monitoring type and probe point has specific scenarios and advantages; selecting appropriate nodes and employing multi‑layered synthetic strategies enables early fault detection, resource optimization, and overall business efficiency, ultimately delivering stable, reliable services and high‑quality user experiences.

operationsobservabilityNetwork ReliabilitySynthetic Monitoring
Alibaba Cloud Observability
Written by

Alibaba Cloud Observability

Driving continuous progress in observability technology!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.