Master Payment Gateway Design: Multi‑Channel Aggregation, Smart Routing, and End‑to‑End Merchant Onboarding

The article explains how to build an enterprise‑grade payment gateway that unifies over 50 providers, performs millisecond‑level smart routing, handles failover, dynamic fee calculation, automated merchant onboarding, sharded storage, and comprehensive monitoring to sustain millions of transactions per day.

LuTiao Programming
LuTiao Programming
LuTiao Programming
Master Payment Gateway Design: Multi‑Channel Aggregation, Smart Routing, and End‑to‑End Merchant Onboarding

Why a full‑featured payment gateway is needed

Developers often start with a simple payment interface that receives parameters, calls a third‑party API, and returns a result. In real business scenarios this quickly leads to single‑channel failures that block all orders, TPS spikes that cause system collapse, complex merchant fee structures that are hard to maintain, and reconciliation mismatches. The root cause is that only a "payment interface" has been built, not a complete "payment gateway system".

Core capabilities of an enterprise payment gateway

Unified access to multiple payment methods (cards, UPI, wallets, online banking)

Aggregation of more than 50 payment providers

Intelligent routing decisions based on cost, success rate, and latency

Automatic failover handling

Merchant onboarding and KYC workflow

Dynamic fee calculation

Reconciliation and settlement

Scale expectations

Monthly transaction volume: 1 billion

Peak TPS: >20,000

Number of merchants: 100,000

Number of payment providers: 50+

Storage requirements:

Monthly transaction data: 2 TB

7‑year retention: 168 TB

Consequences:

Single‑node solutions are infeasible

Distributed architecture with sharding is mandatory

All core logic must complete within milliseconds

Overall architecture

Architecture diagram
Architecture diagram

Core service breakdown

PaymentService

Unified entry point

Manages transaction lifecycle

Coordinates routing, fee calculation, and channel invocation

RoutingService

Selects the optimal payment channel

Real‑time decision making (<50 ms)

MerchantService

Merchant registration

KYC verification

API key management

ReconciliationService

Matches gateway records with provider data

SettlementService

Handles T+1 / T+2 fund settlement

Smart routing as the competitive edge

Routing scoring model

ProviderScore =
    costWeight × costScore +
    latencyWeight × latencyScore +
    successRateWeight × successRateScore +
    healthWeight × healthScore +
    loadWeight × loadScore

Default weight example:

Cost: 30%

Success rate: 35%

Latency: 20%

Health: 10%

Load: 5%

Routing flow

Routing flow diagram
Routing flow diagram

Failover and circuit breaker

Circuit breaker states

Closed – normal operation

Open – immediate failure

Half‑Open – trial recovery

Strategy:

5 consecutive failures → Open

After 60 s → Half‑Open

Successful recovery → Closed

Dynamic fee calculation

Supported modes:

1. Percentage

Fee = amount × rate

2. Fixed amount

Fee = fixed

3. Tiered pricing

0‑1000: 2%
1000‑10000: 1.5%

4. Hybrid

Fee = fixed + (amount × rate)

Automated merchant onboarding

Onboarding flow diagram
Onboarding flow diagram

API key generation rule

sk_live_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Implementation highlights:

32‑character random string

Store only SHA‑256 hash

Return the key only once

Scalable database design

Table schemas (simplified)

CREATE TABLE merchants (
    merchant_id VARCHAR(50) PRIMARY KEY,
    business_name VARCHAR(255) NOT NULL,
    email VARCHAR(255) UNIQUE,
    kyc_status VARCHAR(20),
    status VARCHAR(20),
    created_at TIMESTAMP
);
CREATE TABLE transactions (
    transaction_id VARCHAR(50) PRIMARY KEY,
    merchant_id VARCHAR(50),
    amount DECIMAL(15,2),
    status VARCHAR(20),
    provider_id VARCHAR(50),
    created_at TIMESTAMP
) PARTITION BY RANGE (created_at);

Sharding strategy:

Hash‑based split by merchant_id Time‑based monthly partitions

Reconciliation process

Matching rules:

Same transaction_id Exact amount match

Timestamp difference < 5 minutes

Settlement workflow

T+1 – standard

T+2 – high‑risk merchants

T+0 – value‑added services

Idempotency design

Idempotent key format: merchant_id + order_id Strategy:

Store keys in Redis for 24 hours

Duplicate requests return the original result

System expansion strategies

Service layer

Stateless design

Horizontal scaling

Data layer

Sharding and partitioning

Read‑write separation

Cache layer

Local cache + Redis

Channel layer

Connection pooling

Multi‑instance load balancing

Monitoring and alerting

Key metrics:

Success rate

TPS

Latency (P95 / P99)

Channel health

Error rate

Alert thresholds:

Success rate < 99%

Latency > 500 ms

Error rate > 0.1%

Cost considerations

Rough cost components:

Compute resources

Database

Cache

Network

Storage

Optimization directions:

Auto‑scaling

Spot instances

Multi‑region deployment

Future evolution

Machine‑learning‑driven routing decisions

Real‑time full‑volume reconciliation

Multi‑region disaster recovery

A/B testing of routing strategies

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

distributed systemsHigh concurrencydatabase shardingfailoverpayment gatewaysmart routingmerchant onboarding
LuTiao Programming
Written by

LuTiao Programming

LuTiao Programming is a friendly community offering free programming lessons. We inspire learners to explore new ideas and technologies and quickly acquire job-ready skills.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.