When Splitting a System into 200 Microservices Almost Ruined the Company

The article uses a night‑market analogy to explain practical microservice design, covering domain‑based service decomposition, service discovery, communication protocols, data consistency strategies, fault‑tolerance, rate limiting, and monitoring, while warning against over‑splitting and unnecessary complexity.

IT Services Circle
IT Services Circle
IT Services Circle
When Splitting a System into 200 Microservices Almost Ruined the Company

1. Service Partitioning: Split by Business Domain, Not by Process

A system must be divided into a reasonable number of services; too few yields a monolith, too many creates fragmentation. An example shows a user‑management system split into separate registration, login, password‑change, and password‑recovery services, increasing a single registration request from 200 ms to 2 s because the client must call four APIs.

The recommended approach is to partition by "category" (business domain). Each domain—order, product, payment—should be a single service handling the complete lifecycle of that entity. Additionally, if two functionalities always change together (e.g., price and inventory), they should stay in the same service to avoid coordination problems.

The split is considered successful when a failure in one service does not affect others; for instance, if the payment service crashes, users can still browse products and add items to the cart.

2. Service Communication: Service Registry and Clear Contracts

Microservices need two capabilities: locating the target service and expressing the request clearly. Service discovery is handled by a registry such as Nacos or Eureka, where each instance registers its IP and port at startup. Clients query the registry to obtain the current address, allowing transparent scaling and relocation.

For low‑frequency calls, HTTP + JSON is sufficient (e.g., {"orderId":123,"amount":99.9}{"code":200,"msg":"success"}). For high‑frequency interactions, RPC frameworks like Dubbo or gRPC provide lower latency but require a predefined interface.

A contract (API documentation or Swagger) must be established to avoid mismatched fields, such as sending orderId=123 when the receiver expects a payment order number.

3. Data Consistency: Prefer Eventual Consistency When Possible

Instead of distributed transactions, aim to perform the entire workflow within a single service. An order service can create a "pending" order, call the inventory service to reserve stock, and only then mark the order as "locked". If inventory reservation fails, the order is marked as "creation failed". A scheduled task can later cancel stale pending orders and restore inventory.

This "final consistency" model satisfies most business scenarios. Strict ACID transactions are only needed for cases like money transfers, where a local message table pattern can be used: the debit service writes a pending message, the credit service processes it, and the debit service marks the message as completed.

4. Fault Tolerance and Rate Limiting: Prevent Cascading Failures

Three techniques are recommended:

Timeout + Retry : Set a timeout (e.g., 3 seconds) for remote calls and retry only idempotent operations.

Circuit Breaker : Use Resilience4j or Sentinel to stop calls to a failing service after a failure‑rate threshold (e.g., 50%).

Degradation : Disable non‑essential features (e.g., order history) during overload to keep core functionality alive.

Rate limiting controls traffic spikes (e.g., a flash‑sale expecting 10 k req/s but only able to handle 1 k req/s). Implement token‑bucket or leaky‑bucket algorithms with Guava RateLimiter or Sentinel.

import io.github.resilience4j.circuitbreaker.annotation.CircuitBreaker;
import org.springframework.stereotype.Service;
import org.springframework.web.client.RestTemplate;

@Service
public class PaymentService {
    private final RestTemplate restTemplate;
    public PaymentService(RestTemplate restTemplate) { this.restTemplate = restTemplate; }

    @CircuitBreaker(name = "paymentService", fallbackMethod = "fallback")
    public String callPayment() {
        return restTemplate.getForObject("http://payment-service/pay", String.class);
    }

    public String fallback(Throwable t) {
        return "Payment service temporarily unavailable, please try again later";
    }
}

5. Monitoring and Tracing: Know What Is Happening

Three monitoring dimensions are essential:

Service health : CPU, memory, disk via Prometheus and Grafana.

Business metrics : Request count, success rate, latency via SkyWalking or Pinpoint.

Business outcomes : Order volume, conversion rate.

Distributed tracing (Spring Cloud Sleuth + Zipkin) records the full request path (e.g., gateway → product → order → payment) and helps pinpoint latency or failures.

spring:
  zipkin:
    base-url: http://zipkin-server:9411
  sleuth:
    sampler:
      probability: 1.0  # 100% sampling

6. Summary

Split by domain, not by individual processing steps.

Use a service registry and define clear API contracts.

Prefer eventual consistency; reserve strict transactions for truly atomic scenarios.

Apply timeout‑retry, circuit breaker, degradation, and rate limiting to avoid avalanche failures.

Instrument services with health metrics, business KPIs, and distributed tracing.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

monitoringmicroservicesdistributed tracingrate limitingcircuit breakerservice registryeventual consistencyservice decomposition
IT Services Circle
Written by

IT Services Circle

Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.