Essential Backend Infrastructure for Scalable Internet Services: A Complete Guide

This article outlines the critical backend components and best‑practice architectures—including API gateways, load balancers, service frameworks, caching, databases, search engines, messaging, authentication, configuration, scheduling, logging, data pipelines, and monitoring—that together ensure stable, maintainable, and high‑availability services for modern internet companies.

Architecture Talk
Architecture Talk
Architecture Talk
Essential Backend Infrastructure for Scalable Internet Services: A Complete Guide

API Gateway

Backend services for mobile apps need load balancing, API access control, and user authentication. While Nginx can handle load balancing, integrating permission control and authentication into a unified API gateway (e.g., Kong or a custom solution) simplifies management, though it may become a performance bottleneck.

One alternative is to remove the gateway and let applications call a unified authentication center directly, caching authentication results to reduce load.

Business Applications and Backend Framework

Business applications are divided into online (high traffic, high availability) and internal (confidential, lower load). Java backend frameworks typically include:

MVC frameworks such as Spring MVC, Jersey, JFinal, WebX.

IOC container (Spring).

ORM tools like MyBatis, Spring JDBC, or custom sharding solutions (tddl, sharding‑jdbc, Cobar, Atlas, DDB, MySQL Proxy, MyCat, Kingshard, OneProxy).

Cache wrappers for Redis or Memcached (Spring RedisTemplate, Jedis).

JavaEE performance monitoring (e.g., jwebap or custom extensions).

Choosing the right combination and providing a Maven archetype or similar template speeds up new project setup.

Basic Service Software

Key backend services that affect overall performance include load balancers, web/application servers, databases, search engines, and message queues.

Load Balancer and Web/Application Server

Common web servers are Nginx, Apache, IIS; servlet containers include Tomcat, Jetty; JavaEE servers include JBoss, WebLogic. For multi‑node deployments, use LVS (layer‑4), Nginx (layer‑7), HAProxy (both), DNS round‑robin, or hardware solutions like F5.

Cache

Cache improves read/write speed for frequently accessed data. Use local caches (Guava, ConcurrentHashMap) or distributed caches (Redis, Memcached, Codis, Twemproxy). Pay attention to expiration, eviction (volatile‑lru, allkeys‑lru, etc.), update strategies (cache‑aside, read‑through, write‑through, write‑behind), and overload protection.

Database

Select databases based on data characteristics: in‑memory (Redis, H2), disk‑based (MySQL, Oracle, PostgreSQL, HBase, SQLite, SSDB). Understand indexing (B‑tree vs LSM) to optimize read/write performance.

Search Engine

For content‑heavy applications, use Solr or Elasticsearch (built on Lucene) and consider integration, indexing, and scaling.

Message Queue

Common MQs include ActiveMQ (JMS), RabbitMQ (AMQP), Kafka (log‑based, high‑throughput), and ZeroMQ (socket‑level patterns). They enable decoupling, eventual consistency, broadcasting, and traffic shaping.

Unified Authentication Center

Provides registration, login, token validation, and app secret management for both user‑facing and internal systems, enabling single sign‑on across multiple apps.

Single Sign‑On System

Implement SSO using open‑source solutions like CAS or Kisso, allowing one login to access multiple applications.

Unified Configuration Center

Manage configuration files (properties, YAML) centrally; tools like Disconf or Zookeeper‑based solutions allow dynamic updates without redeploying code.

Service Governance Framework

Use RPC protocols (RMI, Hessian, Thrift, Dubbo) for internal service calls. Service governance includes registration, versioning, load balancing, flow control, fault tolerance, and circuit breaking (e.g., Dubbo, Dubbox, or Spring Cloud).

Unified Scheduling Center

Manage periodic tasks across the cluster using cron, Quartz, Azkaban, Oozie, or custom solutions; support dynamic task modification, workflows, and logging.

Unified Log Service

Collect logs from all services via a central log server using Log4j/Logback appenders and RPC; enables efficient troubleshooting.

Data Infrastructure

Data pipelines move logs from the unified log service to storage using collectors (Scribe, Chukwa, Flume, Kafka) and transport mechanisms.

Data Highway

Synchronize database changes to data warehouses using Sqoop, Canal, or MySQL‑Binlog tools for downstream analytics.

Offline Data Processing

Batch processing with Hadoop or Spark (SQL on Hive/Spark) handles non‑real‑time analytics; address data skew and use Hive/Presto/Impala for MPP performance.

Real‑time Data Processing

Streaming frameworks like Storm or Spark Streaming process time‑sensitive data; choose Storm for low latency, Spark Streaming for micro‑batch workloads.

Data Storage

Choose storage based on access patterns: HDFS for bulk offline data, HBase/Cassandra for random reads/writes, Kudu for hybrid OLAP workloads.

Multi‑dimensional Data Analysis

Use ROLAP (Hive, Spark SQL, Presto, Impala) for ad‑hoc queries or MOLAP (Druid, Pinot, Kylin) for fast cube‑based analytics.

Fault Monitoring

Implement system monitoring (CPU, memory, disk, network) with tools like Nagios, Cacti, OpenFalcon, and business monitoring via custom metrics. Set up alerting (email, IM, SMS) with proper aggregation, severity levels, and use WeChat for cost‑effective notifications.

Incident Response

Build ELK (Elasticsearch, Logstash, Kibana) for centralized log analysis and distributed tracing systems (Zipkin, SkyWalking, OpenTracing, Spring Cloud Sleuth) for end‑to‑end request tracking.

Additional Open‑Source Solutions

Netflix OSS (Zuul, Eureka, Hystrix) and Spring Cloud provide a full suite of microservice components, including API gateway, service discovery, circuit breaking, configuration, security, and distributed tracing.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Backendservice discoverycachingAPI gatewayInfrastructure
Architecture Talk
Written by

Architecture Talk

Rooted in the "Dao" of architecture, we provide pragmatic, implementation‑focused architecture content.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.