How Splitting a System into 200 Microservices Almost Destroyed Our Company
The article uses a night‑market analogy to explain common microservice pitfalls—over‑splitting, poor service boundaries, fragile communication, data‑consistency challenges, fault‑tolerance, rate‑limiting, and monitoring—providing concrete examples, best‑practice rules, and Java code snippets to help teams avoid costly mistakes.
Imagine a company that decides to "embrace microservices" for the next project, turning a monolith into hundreds of services. The author compares this to turning a tidy apartment into a chaotic night‑market, illustrating why careless decomposition can cripple a system.
1. Service Partitioning: Split by Business Domain, Not by Process
Splitting a system into too many fine‑grained services leads to latency and complexity. An example shows a user‑management module broken into separate registration, login, password‑change, and password‑recovery services, causing a single registration to invoke four APIs and increase response time from 200 ms to 2 s.
The correct approach is to split by "category" (business domain). Each domain—order, product, payment—should be a single service handling the full lifecycle of that entity. Additionally, if two functions always change together (e.g., price and inventory), they should stay in the same service to avoid synchronization bugs.
The guiding rule: a service failure must not affect other services; for instance, if the payment service crashes, browsing and cart operations should still work.
2. Service Communication: Service Registry and Clear Contracts
Two core problems are locating a service and defining the request format. Service discovery tools such as Nacos or Eureka act as a dynamic "stall sign" that records each service’s address at startup, allowing callers to query the current location without code changes.
For low‑frequency calls, HTTP + JSON is sufficient, e.g.: {"orderId":123, "amount":99.9} returns {"code":200, "msg":"success"} For high‑frequency interactions, use RPC frameworks like Dubbo or gRPC, which require pre‑agreed interfaces. Always publish a "menu" (API contract) via documentation or Swagger to avoid mismatched fields such as sending orderId=123 when the receiver expects a payment‑order number.
3. Data Consistency: Prefer Eventual Consistency When Possible
Instead of distributed transactions, aim to perform related operations within a single service. A typical pattern is to create an order in a "pending" state, deduct inventory, then mark the order as "locked". If inventory deduction fails, set the order to "creation failed" and use a scheduled task to clean up stale orders. This "final consistency" model handles 90% of business scenarios with simpler code. Strict distributed transactions are only needed for truly atomic scenarios like money transfers, where a local‑message‑table approach can replace heavyweight protocols such as TCC.
4. Fault Tolerance and Rate Limiting: Prevent a Snowball Effect
Microservice cascades ("avalanche") occur when a downstream failure blocks upstream threads. Three techniques mitigate this:
Timeout & Retry : set a timeout (e.g., 3 s) for remote calls and retry only idempotent operations.
Circuit Breaker : temporarily stop calling a failing service once its error rate exceeds a threshold (e.g., 50%). Tools like Resilience4j or Sentinel implement this.
Degradation : disable non‑essential features (e.g., order‑history view) during overload to keep core functions alive.
Rate limiting controls traffic spikes, e.g., using a token‑bucket algorithm or Guava's RateLimiter to allow only a certain number of requests per second. Example of a circuit‑breaker implementation with Resilience4j:
import io.github.resilience4j.circuitbreaker.annotation.CircuitBreaker;
import org.springframework.stereotype.Service;
import org.springframework.web.client.RestTemplate;
@Service
public class PaymentService {
private final RestTemplate restTemplate;
public PaymentService(RestTemplate restTemplate) { this.restTemplate = restTemplate; }
@CircuitBreaker(name = "paymentService", fallbackMethod = "fallback")
public String callPayment() {
return restTemplate.getForObject("http://payment-service/pay", String.class);
}
public String fallback(Throwable t) {
return "支付服务暂时不可用,请稍后再试";
}
}5. Monitoring and Tracing: See What’s Happening
Three key observability dimensions are required:
Service health : CPU, memory, disk usage via Prometheus + Grafana.
Business metrics : request count, success rate, latency via SkyWalking or Pinpoint.
Business outcomes : order volume, payment conversion rate.
Distributed tracing (e.g., Spring Cloud Sleuth + Zipkin) records the full request path (gateway → product → order → payment) and helps pinpoint latency or failures. Example configuration:
spring:
zipkin:
base-url: http://zipkin-server:9411
sleuth:
sampler:
probability: 1.0 # 100% sampling6. Takeaway
Microservices are not magic; follow these practical rules:
Split by domain, not by technical steps.
Use a service registry and publish clear API contracts.
Prefer eventual consistency; avoid heavyweight distributed transactions.
Implement timeout, circuit‑breaker, degradation, and rate‑limiting.
Instrument health, business, and outcome metrics; use tracing for root‑cause analysis.
Start with a monolith for small teams, then evolve to a microservice “night market” only when business demand justifies it.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Tech Enthusiast
Sharing computer programming language knowledge, focusing on Java fundamentals, data structures, related tools, Spring Cloud, IntelliJ IDEA... Book giveaways, red‑packet rewards and other perks await!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
