How to Build a Scalable Backend Stack for Startups
This guide outlines the essential components of a startup’s backend architecture, covering language choices, middleware, databases, messaging, monitoring, CI/CD, and cloud services, and provides practical selection criteria and best‑practice recommendations to help teams design a robust, scalable, and maintainable system.
When you think of a backend technology stack, you might picture a diagram of programming languages, but the stack involves much more than just languages. It includes frameworks, databases, services, operating systems, and other components that together form the entire backend ecosystem.
Four Layers of a Backend Stack
Language : the programming languages used (e.g., C++, Java, Go, PHP, Python, Ruby).
Component : middleware such as message queues and database components.
Process : development, project, release, monitoring, and coding standards.
System : systems that enforce the processes, like release management platforms and code repositories.
The following sections discuss the selection of each major system or component for a startup.
1. Project/Bug/Issue Management
Redmine : Ruby‑based, plugin‑rich, customizable fields, but many plugins are outdated.
Phabricator : PHP‑based, originally from Facebook, integrates code review, task and document management.
Jira : Java‑based, supports user stories, task breakdown, burndown charts, and cross‑department collaboration.
Wukong CRM : Customer‑relationship system, useful for B2B startups; open‑source version covers core CRM functions but is hard to maintain at larger scales.
2. DNS
Alibaba Wanwang : Integrated domain service after Alibaba’s 2014 acquisition of Wanwang.
Tencent DNSPod : Acquired by Tencent in 2012, provides domain resolution and basic protection.
For domestic services, choose either provider; for international coverage, Amazon Route 53 is recommended.
3. Load Balancer (LB)
Supports L4 (TCP/UDP) and L7 (HTTP/HTTPS) protocols.
Provides centralized certificate management and health checks.
Use cloud provider LB services (e.g., Alibaba SLB, Tencent CLB, Amazon ELB) when all machines are in the same cloud; otherwise consider LVS + Nginx for self‑hosted environments.
4. CDN
Domestic market is dominated by Wangsu, followed by Tencent and Alibaba. Internationally, Amazon and Akamai hold the majority share. For startups, Tencent Cloud or Alibaba Cloud CDNs are sufficient, but using multiple CDNs improves coverage and provides disaster‑recovery benefits.
5. RPC Frameworks
RPC enables remote procedure calls across machines. Two main families exist:
Cross‑language RPC : Thrift, gRPC, Hessian, Hprose – focus on language‑agnostic calls but lack built‑in service discovery.
Service‑governance RPC : Dubbo, DubboX, Motan, rpcx – provide high performance, service discovery, and governance, primarily for Java or Go ecosystems.
6. Service Discovery
Commonly used registries:
etcd : Distributed key‑value store used by Kubernetes and Cloud Foundry.
Consul : Provides service discovery, health checking, and configuration.
Apache Zookeeper : Coordination service originally part of Hadoop.
Custom implementations or Redis can also be used, but require additional effort to ensure high availability.
7. Relational Databases
Traditional RDBMS: Oracle, MySQL, MariaDB, DB2, PostgreSQL. NewSQL systems must satisfy full SQL support, ACID transactions, elastic scaling, automatic failover, and basic analytics. MySQL is widely used; MariaDB is its community‑driven fork. NewSQL examples include CockroachDB and TiDB, which address sharding and scaling challenges.
8. NoSQL
NoSQL complements relational databases and comes in four major types:
Key‑Value : Redis, Memcached, BerkeleyDB – simple, fast, but lack structured queries.
Column‑Family : HBase, Cassandra – suited for write‑heavy workloads.
Document : MongoDB, CouchDB – store heterogeneous JSON‑like data.
Graph : Neo4j, InfoGrid – excel at relationship‑centric queries.
9. Message Middleware
Used for asynchronous processing, system decoupling, and traffic shaping. Selection criteria include maturity, community support, licensing, language bindings, performance, persistence, transaction support, clustering, load balancing, management UI, and deployment model.
10. Code Management
Security & Permissions : Keep code in an internal network and enforce strict access controls.
Tools : Git is the de‑facto standard. GitLab (open‑source) combined with Gerrit for code review offers a robust solution.
11. Continuous Integration (CI)
Jenkins : Extensible, open‑source, supports distributed builds.
TeamCity : User‑friendly but commercial for larger teams.
Strider : Node.js‑based, MongoDB storage.
GitLab CI : Integrated with GitLab, works well with Docker.
Travis CI : SaaS‑oriented, good for open‑source projects.
Go : ThoughtWorks’ Cruise Control clone, free and cross‑platform.
12. Logging System
Typical ELK stack (Elasticsearch, Logstash, Kibana) plus Filebeat for lightweight log collection. Secure access via Nginx reverse proxy and basic authentication.
13. Monitoring System
Two layers: OS‑level metrics (CPU, memory, I/O) and service‑level metrics (availability, QPS, error rate). Popular solutions include Zabbix, Open‑Falcon, and Prometheus (widely adopted in Western regions). Grafana provides visualization.
14. Configuration Management
Based on ZooKeeper or etcd with UI and API, storing versioned configurations.
Or push‑based configuration files via automation tools like Puppet or Ansible.
15. Release / Deployment System
Typical flow: code → artifact → deployable service → production. Open‑source options include Walle, Piplin, or a combination of Jenkins + GitLab + Walle for early stages.
16. Jump Server
Jumpserver (open‑source) offers role‑based access control, audit logging, and session recording, helping enforce compliance for privileged operations.
17. Machine Management
Tool selection criteria: simplicity, agent‑less operation, language ecosystem, and concurrency model. Ansible is often preferred for startups due to its agent‑less design and YAML‑based playbooks.
Startup‑Specific Considerations
Choose a language the team knows well, that has modern features and a rich ecosystem.
Select reliable cloud providers and mature open‑source components.
Establish clear development, release, and operational processes.
Balance cost, time‑to‑market, and future scalability when making technology decisions.
Cloud‑Based Backend Architecture for Startups
Combining the above selections, a cloud‑native backend architecture typically includes cloud compute, managed databases, message queues, CDN, monitoring (Prometheus + Grafana), logging (ELK), CI/CD pipelines, and configuration services (etcd/ZooKeeper).
Source: Article originally published on the "Intelligent Recommendation System" WeChat public account.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.