Essential Backend Infrastructure and Services for Internet Companies

This article outlines the essential backend infrastructure components and best‑practice patterns—such as API gateways, service frameworks, caching, databases, search engines, message queues, authentication, configuration, service governance, scheduling, logging, and monitoring—required to build stable, scalable, and maintainable internet applications.

Architecture Digest
Architecture Digest
Architecture Digest
Essential Backend Infrastructure and Services for Internet Companies

Introduction

For an Internet company, backend services are indispensable. Beyond business logic, a reliable set of foundational services is needed to ensure stability, maintainability, and high availability. The article presents a comprehensive view of the critical backend infrastructure components.

API Gateway

Mobile apps often require load balancing, API access control, and user authentication. While Nginx can handle load balancing and per‑service libraries can provide access control, a dedicated API gateway (e.g., Kong) integrates these functions, allowing dynamic permission changes and reducing integration effort. However, the gateway can become a performance bottleneck, so some architectures bypass it and let services call a unified authentication center directly.

Business Applications and Backend Frameworks

Business applications are divided into online (high‑traffic, low‑tolerance) and internal (confidential, lower load) types. For Java backends, typical frameworks include MVC (Spring MVC, Jersey, JFinal, WebX), IoC (Spring), ORM (MyBatis, Spring JDBC, sharding‑jdbc, custom middleware), caching (Spring RedisTemplate, Jedis), and performance monitoring (jwebap). Selecting the right combination based on team expertise is crucial, and providing archetype templates accelerates new project setup.

Core Backend Services

Cache

Caching follows the "five‑minute rule": frequently accessed data should reside in memory. Options include local caches (Guava, ConcurrentHashMap) and distributed caches (Redis, Codis, Twemproxy). Proper eviction, expiration, and update strategies (volatile‑lru, allkeys‑random, cache‑aside, etc.) are essential to avoid cache‑induced overload.

Database

Databases are categorized by storage medium (in‑memory vs. disk) and data model (relational vs. NoSQL). Relational databases (MySQL, PostgreSQL, Oracle) use B‑tree indexes, while NoSQL stores (Redis, MongoDB, HBase) use structures like LSM trees. Understanding indexing and sharding is key for performance.

Search Engine

Search engines (Solr, Elasticsearch) are vital for content‑heavy applications. Integration requires careful data indexing and alignment with existing data pipelines.

Message Queue

Message queues decouple services and enable asynchronous communication, supporting use‑cases such as decoupling, eventual consistency, broadcasting, and traffic shaping. Popular choices include ActiveMQ, RabbitMQ, Kafka, and ZeroMQ.

File Storage

All services ultimately rely on reliable file storage. Solutions range from traditional RAID to distributed systems like HDFS, NFS, or Samba. When storage becomes a bottleneck, SSDs provide a straightforward performance boost.

Unified Authentication Center

A central authentication service handles user registration, login, token validation, internal system authentication, and app secret management, enabling single sign‑on across multiple applications.

Single Sign‑On (SSO) System

SSO systems (e.g., CAS, Kisso) allow one login to access multiple services, improving user experience for both web and mobile platforms.

Unified Configuration Center

Configuration files (properties, YAML, HOCON) can be managed centrally, allowing dynamic updates without redeployment. Tools like Disconf or Zookeeper‑backed solutions provide this capability.

Service Governance Framework

Internal service communication often uses RPC protocols (RMI, Hessian, Thrift, Dubbo). A governance framework registers providers and consumers, manages versions, load balancing, traffic control, fault tolerance, and circuit breaking. Dubbo (or its fork Dubbox) is a common implementation.

Unified Scheduling Center

Scheduling tasks (cron, Quartz) can be centralized to simplify management, scaling, and monitoring. Solutions include Azkaban, Oozie, or custom Quartz clusters backed by Zookeeper, with extensions like Elastic‑Job for elasticity.

Unified Logging Service

Centralized logging aggregates logs from all services via log4j/logback appenders and transports them to a dedicated log server, facilitating troubleshooting.

Data Infrastructure

Data Highway

Logs are collected (e.g., Scribe, Flume, Kafka) and transmitted to downstream processors. Data synchronization between databases and warehouses uses tools like Sqoop or Canal.

Offline Data Analysis

Batch processing uses Hadoop or Spark, with Hive or Spark SQL for SQL‑style jobs. Handling data skew is critical for performance.

Real‑Time Data Analysis

Streaming frameworks (Storm, Spark Streaming) address low‑latency requirements, often employing windowed writes to storage.

Ad‑Hoc Data Analysis

SQL‑based query tools (Presto, Impala, Hive) and UI layers (Hue) empower analysts and product managers to explore data directly.

Fault Monitoring

Monitoring spans system metrics (CPU, memory, network) using tools like Nagios, Cacti, OpenFalcon, and business metrics (PV, UV, transaction failures). Alerting strategies include aggregation, severity levels, and delivery via email, IM, or WeChat. Incident response relies on centralized log analysis (ELK) and distributed tracing (Zipkin, SkyWalking, Spring Cloud Sleuth).

Extensions

Netflix OSS

Components such as Zuul (API gateway), Eureka (service discovery), and Hystrix (circuit breaking) provide a full microservice stack, often complemented by Docker/Kubernetes.

Spring Cloud

Spring Cloud offers a comprehensive suite for building distributed systems, including Config, Security, Consul/Zookeeper integration, Bus, and Sleuth for tracing.

Author: 飒然Hang, Architect/Backend Engineer, working@中华万年历 Source: http://www.rowkey.me/blog/2016/08/27/server-basic-tech-stack/
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

backendMonitoringmicroservicescachingInfrastructureapi-gatewayservice-discovery
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.