Backend Development 37 min read

Taming Microservice Chaos: Stability, Degradation & Data Consistency

This article shares practical guidance on microservice benefits, common pitfalls such as stability and data consistency issues, and detailed solutions including circuit breakers, service degradation tactics, TCC distributed transactions, transactional messaging with RocketMQ, seamless data migration, and full‑stack APM monitoring.

dbaplus Community

Aug 27, 2020

Taming Microservice Chaos: Stability, Degradation & Data Consistency

Why Microservices? Benefits and Drawbacks

Microservices modularize monolithic applications, reduce coupling, hide technical details from business logic, isolate data per service, clarify boundaries, cut code conflicts, and enable reuse.

Modularization lowers coupling by allowing changes in a single service.

Technical concerns like caching are encapsulated within the owning service.

Each service owns its database tables, preventing cross‑module data coupling.

Clear business and code boundaries improve maintainability.

Separate services reduce code merge conflicts.

Reusable components reduce code duplication.

However, microservices also introduce problems such as system stability, complex call chains, data consistency, and operational challenges.

Ensuring Microservice System Stability

Snowball Effect and Prevention

When a service fails, dependent services may also fail, causing a cascade (snowball). To avoid this, add circuit breakers between services. If a downstream service is unhealthy, the upstream service fails fast and returns a fallback response, preventing thread pool exhaustion.

Example: Service A calls B, B calls C. If C fails, B’s thread pool can be exhausted, then A also fails, leading to a chain reaction.

Apply circuit breakers (e.g., Hystrix, Resilience4j) on upstream services.

Isolate JVM threads per service to avoid thread‑pool pollution.

Handling Sudden Traffic Spikes

During promotions or flash sales, traffic can exceed forecasts. Implement rate limiting at the gateway (Zuul, Gateway, Nginx) to reject excess requests and protect backend services.

Global, IP‑based, or user‑ID based limiting.

Two main goals: protect services from overload and prevent abuse.

Data Redundancy

Redundant critical data in dependent services so that if a service fails, callers can still operate using cached copies. Example: Order service caches price data to continue serving orders when the price service is down.

Service Degradation

Degradation includes circuit breaking, rate limiting, and data redundancy. It can be manual (feature toggles) or automatic (circuit breaker, rate limiting). During high load, non‑core features (e.g., logistics query) are turned off to keep core functionality stable.

Manual degradation: switch off secondary features.

Read‑only degradation: serve data from cache only.

Write‑only degradation: disable writes, keep reads.

Cache Considerations

Cache penetration: store empty placeholders for non‑existent keys.

Cache avalanche: stagger TTLs to avoid simultaneous expiration.

Cache hot‑spot: use read‑write splitting, master‑slave replication, or sharding for high‑traffic keys.

Isolation Strategies

Deployment isolation: separate critical and flash‑sale workloads into different clusters.

Data isolation: separate databases for flash‑sale and regular orders, using eventual‑consistency mechanisms.

Business isolation: pre‑register flash‑sale items, generate static pages, and pre‑warm Redis.

Data Consistency in a Microservice Architecture

Splitting a monolith creates multiple databases, breaking traditional ACID transactions. Distributed transaction patterns such as Two‑Phase Commit, XA, TCC, Saga, and Seata are needed.

TCC (Try‑Confirm‑Cancel) Example

Scenario: e‑commerce order flow with steps – update order status, freeze inventory, reserve coupon, and notify WMS.

public void makePayment() {
    orderService.updateStatus(OrderStatus.Payed);
    inventoryService.decrStock();
    couponService.updateStatus(CouponStatus.Used);
}

Without coordination, partial failures cause inconsistency. TCC solves this by splitting each operation into three phases:

Try: Reserve resources and set temporary states (e.g., order status = Paying, inventory locked, coupon = Inuse).

Confirm: Commit the reserved resources (order = Payed, inventory locked released, coupon = Used).

Cancel: Roll back to original state if any try fails.

Code snippets (simplified) using the Hmily framework:

// OrderService
@Hmily(confirmMethod="confirmOrderStatus", cancelMethod="cancelOrderStatus")
public void makePayment() {
    // update order to Paying
    // lock inventory via RPC
    // set coupon to Inuse via RPC
}

public void confirmOrderStatus() { /* set order to Payed */ }
public void cancelOrderStatus() { /* revert order to UnPayed */ }

Similar TCC annotations are applied to InventoryService and CouponService with corresponding confirm/cancel methods.

Message‑Based Final Consistency

When asynchronous actions (e.g., notifying WMS) are needed, use transactional messages. RocketMQ supports half‑messages that are confirmed or rolled back after the local transaction succeeds.

public void executeLocalTransaction(Message msg, Object arg) {
    // record orderId and status
    return LocalTransactionState.COMMIT_MESSAGE;
}

public LocalTransactionState checkLocalTransaction(MessageExt msg) {
    // retrieve status and return COMMIT, ROLLBACK, or UNKNOW
}

If the confirm message fails to reach the broker, RocketMQ will periodically query the producer to decide commit or rollback, ensuring eventual consistency.

Seamless Data Migration During Microservice Adoption

Key steps for zero‑downtime migration:

Enable dual‑write: write to both old and new databases. For updates, read from old if the new record is missing, optionally using async queues for performance.

Migrate historical data up to a chosen timestamp, ignoring conflicts because dual‑write already kept data in sync.

Validate migrated data with scripts.

Enable dual‑read: gradually shift read traffic to the new database, falling back to old when missing.

After full traffic migration, stop writes to the old database and clean up dual‑write/read code.

Full‑Stack APM Monitoring for Microservices

Microservice complexity demands observability. Open‑source tools like Pinpoint or SkyWalking provide:

Topology maps showing service‑to‑service, service‑to‑DB, and service‑to‑cache dependencies.

Call‑stack traces with per‑method latency, highlighting slow database queries.

Server‑map visualizing request counts per node.

JVM metrics: heap, GC, thread count, file descriptors, CPU usage.

These insights enable rapid root‑cause analysis without digging through logs.

Conclusion

The article consolidates practical experiences on microservice stability, degradation, distributed transactions, data migration, and observability. While there is no one‑size‑fits‑all solution, the presented patterns help improve system availability, maintain data integrity, and reduce operational pain during large‑scale microservice transformations.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Data Consistency service degradation TCC circuit breaker APM monitoring

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.