From Monolith to Microservices: A Practical Journey of an Online Store
This article walks through the evolution of an online supermarket from a simple monolithic website to a fully decoupled microservice architecture, highlighting common pitfalls, design decisions, component choices such as monitoring, tracing, service discovery, circuit breaking, testing, and the trade‑offs of each step.
Background and Initial Monolith
The story starts with a small online supermarket built as a single website and a separate admin backend. The initial feature list includes user registration, product browsing, ordering, and admin functions for managing users, products, and orders. Because the requirements were simple, the monolith was quickly assembled and deployed on a cloud VM.
Problems After Rapid Expansion
As the business grew, new marketing features, a mobile app, and data‑analysis capabilities were added without proper architectural planning. This resulted in duplicated code across web and mobile services, tangled API calls, blurred service boundaries, a shared database that became a performance bottleneck, and increasingly difficult deployment, testing, and team coordination.
First Refactor: Extracting Common Services
To reduce redundancy, the team abstracted shared business capabilities into five public services: User, Product, Promotion, Order, and Data‑Analysis. Each application now consumes these services, leaving only thin controllers and front‑ends. The architecture still used a shared database, so some monolithic drawbacks persisted, such as database contention and schema coupling.
Database Split and Asynchronous Messaging
The team then isolated each service’s persistence layer, allowing heterogeneous storage (e.g., a data‑warehouse for analytics, caches for high‑traffic services). A message‑queue was introduced to improve real‑time processing. This eliminated the single‑point‑of‑failure database and enabled independent scaling of services.
Operational Challenges and Failure Scenario
During a shopping‑festival spike, the Promotion service was overwhelmed, causing a cascade failure across the system. The incident highlighted two key issues: difficulty locating faults in a distributed system and the need for rapid scaling of failing services.
Monitoring
Each component now exposes a uniform /metrics endpoint. Prometheus scrapes these metrics, and Grafana visualizes them with alerts for thresholds such as CPU, memory, request latency, and error rates. Exporters like RedisExporter and MySQLExporter provide ready‑made metrics for common dependencies.
Distributed Tracing
To pinpoint latency and failure propagation, the team added tracing headers (traceId, spanId, parentId, requestTime, responseTime) to every HTTP call and shipped the data to Zipkin. The trace view shows a tree of service calls, making it easy to identify the offending service.
Images are from the Istio documentation.
Log Analysis
Log volume grew beyond manual inspection, so the ELK stack (Elasticsearch, Logstash, Kibana) was adopted. Services write logs to files; lightweight agents tail the files and forward entries to Logstash, which indexes them in Elasticsearch for fast search and visualization in Kibana.
API Gateway and Service Governance
An API gateway sits at the edge, handling authentication, request routing, and providing a unified API catalog. The team chose a coarse‑grained approach: one gateway for the entire suite of services, simplifying management while still allowing internal direct calls.
Service Registration and Discovery
Dynamic scaling is supported by a service‑registry (e.g., Consul, Eureka, Etcd, or a custom Redis‑based solution). Services register themselves on startup and periodically refresh health status. Clients query the registry for up‑to‑date endpoint lists, enabling zero‑downtime scaling.
Resilience Patterns
Circuit Breaker
If a downstream service repeatedly fails, the circuit breaker opens, instantly returning errors to callers and preventing resource exhaustion.
Service Degradation
Non‑critical features (e.g., product recommendations) can be temporarily disabled when their dependent services are down, preserving core functionality.
Rate Limiting
Per‑service or per‑client rate limits protect downstream services from overload, especially after a sudden traffic surge.
Testing Strategy
End‑to‑end tests covering user‑level flows, typically run on a staging environment.
Service‑level tests that validate each API contract, often using mock servers for dependent services.
Unit tests for individual code units, providing fast feedback and high coverage.
Microservice Framework vs Service Mesh
The team built a lightweight framework to encapsulate boilerplate for metrics, tracing, logging, registration, and routing. While convenient, the framework creates a tight coupling that makes upgrades costly. As an alternative, a service mesh (e.g., Istio) deploys a sidecar proxy alongside each service, handling traffic management, security, and observability without code changes. The mesh separates data plane (proxies) from control plane (configuration), reducing upgrade friction but adding some latency and operational complexity.
Sidecar proxies share the same host network, so the additional overhead is limited to memory copies.
Conclusion
Microservice adoption is an iterative journey: start with service extraction, then address data isolation, resilience, observability, and governance. The architecture continues to evolve toward newer paradigms such as serverless and FaaS, but the core principles of clear boundaries, automated monitoring, and robust failure handling remain essential.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Full-Stack DevOps & Kubernetes
Focused on sharing DevOps, Kubernetes, Linux, Docker, Istio, microservices, Spring Cloud, Python, Go, databases, Nginx, Tomcat, cloud computing, and related technologies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
