Essential Backend Infrastructure and Services for Internet Companies
This article outlines the essential backend infrastructure components and best‑practice patterns—such as API gateways, service frameworks, caching, databases, search engines, message queues, authentication, configuration, service governance, scheduling, logging, and monitoring—required to build stable, scalable, and maintainable internet applications.
Introduction
For an Internet company, backend services are indispensable. Beyond business logic, a reliable set of foundational services is needed to ensure stability, maintainability, and high availability. The article presents a comprehensive view of the critical backend infrastructure components.
API Gateway
Mobile apps often require load balancing, API access control, and user authentication. While Nginx can handle load balancing and per‑service libraries can provide access control, a dedicated API gateway (e.g., Kong) integrates these functions, allowing dynamic permission changes and reducing integration effort. However, the gateway can become a performance bottleneck, so some architectures bypass it and let services call a unified authentication center directly.
Business Applications and Backend Frameworks
Business applications are divided into online (high‑traffic, low‑tolerance) and internal (confidential, lower load) types. For Java backends, typical frameworks include MVC (Spring MVC, Jersey, JFinal, WebX), IoC (Spring), ORM (MyBatis, Spring JDBC, sharding‑jdbc, custom middleware), caching (Spring RedisTemplate, Jedis), and performance monitoring (jwebap). Selecting the right combination based on team expertise is crucial, and providing archetype templates accelerates new project setup.
Core Backend Services
Cache
Caching follows the "five‑minute rule": frequently accessed data should reside in memory. Options include local caches (Guava, ConcurrentHashMap) and distributed caches (Redis, Codis, Twemproxy). Proper eviction, expiration, and update strategies (volatile‑lru, allkeys‑random, cache‑aside, etc.) are essential to avoid cache‑induced overload.
Database
Databases are categorized by storage medium (in‑memory vs. disk) and data model (relational vs. NoSQL). Relational databases (MySQL, PostgreSQL, Oracle) use B‑tree indexes, while NoSQL stores (Redis, MongoDB, HBase) use structures like LSM trees. Understanding indexing and sharding is key for performance.
Search Engine
Search engines (Solr, Elasticsearch) are vital for content‑heavy applications. Integration requires careful data indexing and alignment with existing data pipelines.
Message Queue
Message queues decouple services and enable asynchronous communication, supporting use‑cases such as decoupling, eventual consistency, broadcasting, and traffic shaping. Popular choices include ActiveMQ, RabbitMQ, Kafka, and ZeroMQ.
File Storage
All services ultimately rely on reliable file storage. Solutions range from traditional RAID to distributed systems like HDFS, NFS, or Samba. When storage becomes a bottleneck, SSDs provide a straightforward performance boost.
Unified Authentication Center
A central authentication service handles user registration, login, token validation, internal system authentication, and app secret management, enabling single sign‑on across multiple applications.
Single Sign‑On (SSO) System
SSO systems (e.g., CAS, Kisso) allow one login to access multiple services, improving user experience for both web and mobile platforms.
Unified Configuration Center
Configuration files (properties, YAML, HOCON) can be managed centrally, allowing dynamic updates without redeployment. Tools like Disconf or Zookeeper‑backed solutions provide this capability.
Service Governance Framework
Internal service communication often uses RPC protocols (RMI, Hessian, Thrift, Dubbo). A governance framework registers providers and consumers, manages versions, load balancing, traffic control, fault tolerance, and circuit breaking. Dubbo (or its fork Dubbox) is a common implementation.
Unified Scheduling Center
Scheduling tasks (cron, Quartz) can be centralized to simplify management, scaling, and monitoring. Solutions include Azkaban, Oozie, or custom Quartz clusters backed by Zookeeper, with extensions like Elastic‑Job for elasticity.
Unified Logging Service
Centralized logging aggregates logs from all services via log4j/logback appenders and transports them to a dedicated log server, facilitating troubleshooting.
Data Infrastructure
Data Highway
Logs are collected (e.g., Scribe, Flume, Kafka) and transmitted to downstream processors. Data synchronization between databases and warehouses uses tools like Sqoop or Canal.
Offline Data Analysis
Batch processing uses Hadoop or Spark, with Hive or Spark SQL for SQL‑style jobs. Handling data skew is critical for performance.
Real‑Time Data Analysis
Streaming frameworks (Storm, Spark Streaming) address low‑latency requirements, often employing windowed writes to storage.
Ad‑Hoc Data Analysis
SQL‑based query tools (Presto, Impala, Hive) and UI layers (Hue) empower analysts and product managers to explore data directly.
Fault Monitoring
Monitoring spans system metrics (CPU, memory, network) using tools like Nagios, Cacti, OpenFalcon, and business metrics (PV, UV, transaction failures). Alerting strategies include aggregation, severity levels, and delivery via email, IM, or WeChat. Incident response relies on centralized log analysis (ELK) and distributed tracing (Zipkin, SkyWalking, Spring Cloud Sleuth).
Extensions
Netflix OSS
Components such as Zuul (API gateway), Eureka (service discovery), and Hystrix (circuit breaking) provide a full microservice stack, often complemented by Docker/Kubernetes.
Spring Cloud
Spring Cloud offers a comprehensive suite for building distributed systems, including Config, Security, Consul/Zookeeper integration, Bus, and Sleuth for tracing.
Author: 飒然Hang, Architect/Backend Engineer, working@中华万年历 Source: http://www.rowkey.me/blog/2016/08/27/server-basic-tech-stack/
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
