Handling 1 Million Requests per Minute with Go: A Scalable Backend Architecture

The article describes how a Go‑based backend, using a two‑layer job/worker pattern with buffered channels and configurable worker pools, can reliably ingest millions of POST requests per minute, serialize payloads to Amazon S3, and dramatically reduce server count through Elastic Beanstalk auto‑scaling.

Architecture Digest
Architecture Digest
Architecture Digest
Handling 1 Million Requests per Minute with Go: A Scalable Backend Architecture

The author, a veteran of anti‑ad and anti‑virus software, explains a real‑world challenge: processing POST requests from millions of endpoints, each containing a JSON payload that must be stored in Amazon S3 for downstream map‑reduce analysis.

Initially a naive implementation launched a new goroutine for every payload ( go payload.UploadToS3()), which quickly exhausted resources under high traffic because the number of goroutines could not be limited.

Switching to a single buffered channel to queue payloads (

var Queue chan Payload
func init() {
    Queue = make(chan Payload, MAX_QUEUE)
}

) still failed; the queue filled up faster than workers could upload, causing request handling to block.

The final solution adopts a classic job/worker pattern: a Job struct wraps a payload, a pool of workers registers with a WorkerPool, and a dispatcher routes jobs from a JobQueue to idle workers, respecting configurable limits ( MAX_WORKERS and MAX_QUEUE) read from environment variables.

type Job struct {
    Payload Payload
}

var JobQueue chan Job

type Worker struct {
    WorkerPool chan chan Job
    JobChannel chan Job
    quit chan bool
}

func (w Worker) Start() {
    go func() {
        for {
            w.WorkerPool <- w.JobChannel
            select {
            case job := <-w.JobChannel:
                if err := job.Payload.UploadToS3(); err != nil {
                    log.Errorf("Error uploading to S3: %s", err.Error())
                }
            case <-w.quit:
                return
            }
        }
    }()
}

func (d *Dispatcher) dispatch() {
    for {
        select {
        case job := <-JobQueue:
            go func(job Job) {
                jobChannel := <-d.WorkerPool
                jobChannel <- job
            }(job)
        }
    }
}

The HTTP handler now decodes the incoming JSON, creates a Job for each payload, and pushes it onto JobQueue, allowing the worker pool to process uploads concurrently while keeping request latency low.

Deployed on Amazon Elastic Beanstalk with Docker, the system achieved near‑million‑requests‑per‑minute throughput, reduced the server fleet from 100 to as few as four instances, and leveraged auto‑scaling based on CPU utilization.

The author concludes that simplifying architecture—using Go’s lightweight concurrency and Elastic Beanstalk’s scaling—delivers high performance without the complexity of multiple queues or heavyweight frameworks.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

BackendscalingS3elasticbeanstalkjob queue
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.