Essential Backend Infrastructure for Scalable Internet Services: A Complete Guide
This article outlines the critical backend components and best‑practice architectures—including API gateways, load balancers, service frameworks, caching, databases, search engines, messaging, authentication, configuration, scheduling, logging, data pipelines, and monitoring—that together ensure stable, maintainable, and high‑availability services for modern internet companies.
API Gateway
Backend services for mobile apps need load balancing, API access control, and user authentication. While Nginx can handle load balancing, integrating permission control and authentication into a unified API gateway (e.g., Kong or a custom solution) simplifies management, though it may become a performance bottleneck.
One alternative is to remove the gateway and let applications call a unified authentication center directly, caching authentication results to reduce load.
Business Applications and Backend Framework
Business applications are divided into online (high traffic, high availability) and internal (confidential, lower load). Java backend frameworks typically include:
MVC frameworks such as Spring MVC, Jersey, JFinal, WebX.
IOC container (Spring).
ORM tools like MyBatis, Spring JDBC, or custom sharding solutions (tddl, sharding‑jdbc, Cobar, Atlas, DDB, MySQL Proxy, MyCat, Kingshard, OneProxy).
Cache wrappers for Redis or Memcached (Spring RedisTemplate, Jedis).
JavaEE performance monitoring (e.g., jwebap or custom extensions).
Choosing the right combination and providing a Maven archetype or similar template speeds up new project setup.
Basic Service Software
Key backend services that affect overall performance include load balancers, web/application servers, databases, search engines, and message queues.
Load Balancer and Web/Application Server
Common web servers are Nginx, Apache, IIS; servlet containers include Tomcat, Jetty; JavaEE servers include JBoss, WebLogic. For multi‑node deployments, use LVS (layer‑4), Nginx (layer‑7), HAProxy (both), DNS round‑robin, or hardware solutions like F5.
Cache
Cache improves read/write speed for frequently accessed data. Use local caches (Guava, ConcurrentHashMap) or distributed caches (Redis, Memcached, Codis, Twemproxy). Pay attention to expiration, eviction (volatile‑lru, allkeys‑lru, etc.), update strategies (cache‑aside, read‑through, write‑through, write‑behind), and overload protection.
Database
Select databases based on data characteristics: in‑memory (Redis, H2), disk‑based (MySQL, Oracle, PostgreSQL, HBase, SQLite, SSDB). Understand indexing (B‑tree vs LSM) to optimize read/write performance.
Search Engine
For content‑heavy applications, use Solr or Elasticsearch (built on Lucene) and consider integration, indexing, and scaling.
Message Queue
Common MQs include ActiveMQ (JMS), RabbitMQ (AMQP), Kafka (log‑based, high‑throughput), and ZeroMQ (socket‑level patterns). They enable decoupling, eventual consistency, broadcasting, and traffic shaping.
Unified Authentication Center
Provides registration, login, token validation, and app secret management for both user‑facing and internal systems, enabling single sign‑on across multiple apps.
Single Sign‑On System
Implement SSO using open‑source solutions like CAS or Kisso, allowing one login to access multiple applications.
Unified Configuration Center
Manage configuration files (properties, YAML) centrally; tools like Disconf or Zookeeper‑based solutions allow dynamic updates without redeploying code.
Service Governance Framework
Use RPC protocols (RMI, Hessian, Thrift, Dubbo) for internal service calls. Service governance includes registration, versioning, load balancing, flow control, fault tolerance, and circuit breaking (e.g., Dubbo, Dubbox, or Spring Cloud).
Unified Scheduling Center
Manage periodic tasks across the cluster using cron, Quartz, Azkaban, Oozie, or custom solutions; support dynamic task modification, workflows, and logging.
Unified Log Service
Collect logs from all services via a central log server using Log4j/Logback appenders and RPC; enables efficient troubleshooting.
Data Infrastructure
Data pipelines move logs from the unified log service to storage using collectors (Scribe, Chukwa, Flume, Kafka) and transport mechanisms.
Data Highway
Synchronize database changes to data warehouses using Sqoop, Canal, or MySQL‑Binlog tools for downstream analytics.
Offline Data Processing
Batch processing with Hadoop or Spark (SQL on Hive/Spark) handles non‑real‑time analytics; address data skew and use Hive/Presto/Impala for MPP performance.
Real‑time Data Processing
Streaming frameworks like Storm or Spark Streaming process time‑sensitive data; choose Storm for low latency, Spark Streaming for micro‑batch workloads.
Data Storage
Choose storage based on access patterns: HDFS for bulk offline data, HBase/Cassandra for random reads/writes, Kudu for hybrid OLAP workloads.
Multi‑dimensional Data Analysis
Use ROLAP (Hive, Spark SQL, Presto, Impala) for ad‑hoc queries or MOLAP (Druid, Pinot, Kylin) for fast cube‑based analytics.
Fault Monitoring
Implement system monitoring (CPU, memory, disk, network) with tools like Nagios, Cacti, OpenFalcon, and business monitoring via custom metrics. Set up alerting (email, IM, SMS) with proper aggregation, severity levels, and use WeChat for cost‑effective notifications.
Incident Response
Build ELK (Elasticsearch, Logstash, Kibana) for centralized log analysis and distributed tracing systems (Zipkin, SkyWalking, OpenTracing, Spring Cloud Sleuth) for end‑to‑end request tracking.
Additional Open‑Source Solutions
Netflix OSS (Zuul, Eureka, Hystrix) and Spring Cloud provide a full suite of microservice components, including API gateway, service discovery, circuit breaking, configuration, security, and distributed tracing.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architecture Talk
Rooted in the "Dao" of architecture, we provide pragmatic, implementation‑focused architecture content.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
