How to Merge Go Microservices into a Single Pod and Cut CPU Usage by 60%

This article explains how the team transformed a Go‑based microservice recommendation system into a single‑pod monolithic application using tRPC‑Go, detailing performance bottlenecks, code‑level mock‑proxy techniques, deployment adjustments, and the resulting dramatic reduction in CPU consumption.

dbaplus Community
dbaplus Community
dbaplus Community
How to Merge Go Microservices into a Single Pod and Cut CPU Usage by 60%

Microservice Advantages and Disadvantages

Microservices lower coupling, enable smooth updates, align with DDD, and simplify troubleshooting, but they increase system complexity, add RPC latency and network traffic, raise governance costs, and make multi‑tenant isolation harder.

Problem Encountered

The recommendation system’s feed‑rerank service suffered from excessive CPU usage during load testing because the split‑flow service had to deserialize incoming requests and serialize responses for every call, causing heavy RPC and GC overhead.

Images illustrate the architecture and the observed CPU hotspots:

Solution Overview

By recognizing that the data structures dominating traffic are identical across the call chain, the team decided to bypass network transmission and perform in‑memory calls instead.

They built a proxy API that returns local implementations of service interfaces, effectively mocking RPC calls as ordinary function calls.

Code Refactoring

1. RPC Background

tRPC generates a service interface and a client proxy from a protobuf definition:

service FeedsRerank {
  rpc GetFeedList (GetFeedRequest) returns (GetFeedReply) {}
}
type FeedsRerankService interface {
  GetFeedList(ctx context.Context, req *GetFeedRequest) (*GetFeedReply, error)
}
type FeedsRerankClientProxy interface {
  GetFeedList(ctx context.Context, req *GetFeedRequest, opts ...client.Option) (*GetFeedReply, error)
}

Typical client usage:

rerank := api.FeedsRerank()
resp, err := rerank.GetFeedList(ctx, req)
// ...

2. Client‑Side Changes

The client now obtains a proxy from a shared api package instead of creating a new downstream proxy:

rerank := api.FeedsRerank()
resp, err := rerank.GetFeedList(ctx, req)

3. Server‑Side Changes

The service implementation is extracted from main into a dedicated package and registered with tRPC as usual, then also registered with the proxy API:

pb.RegisterFeedsRerankService(server, rerankImpl)

A mock proxy implements the client interface by delegating to the server implementation:

type rerankProxy struct { impl *rerankImpl }

func (r *rerankProxy) GetFeedList(ctx context.Context, req *pb.GetFeedRequest, opts ...client.Option) (*pb.GetFeedReply, error) {
    rsp := &pb.GetFeedReply{}
    err := r.impl.GetFeedList(req, rsp)
    return rsp, err
}

func (impl *rerankImpl) mockProxy() {
    r := &rerankProxy{impl: impl}
    proxyAPI.RegisterFeedsRerank(r)
}

4. Proxy API Implementation

The proxy API lazily creates default client proxies; if a proxy is never called, missing configuration does not cause a panic.

package proxyapi

type API interface {
    FeedsRerank() pb.NewFeedsRerankClientProxy
    RegisterFeedsRerank(p pb.NewFeedsRerankClientProxy)
}

func DefaultAPI() API { return defaultAPIImpl }

type apiImpl struct {
    internalFeedsRerankClientProxy pb.FeedsRerankClientProxy
    // ... other proxies
}

var defaultAPIImpl = new()

func new() *apiImpl {
    return &apiImpl{internalFeedsRerankClientProxy: pb.NewFeedsRerankClientProxy()}
}

func (a *apiImpl) FeedsRerank() pb.NewFeedsRerankClientProxy { return a.internalFeedsRerankClientProxy }

func (a *apiImpl) RegisterFeedsRerank(p pb.NewFeedsRerankClientProxy) {
    if p != nil { a.internalFeedsRerankClientProxy = p }
}

Deployment Adjustments

All five services are packaged into a single binary and deployed as one pod. The trpc_go.yaml file lists only the external service endpoints; internal services register without additional configuration, and missing entries are ignored safely.

Benefits

Before monolithization, the five services required roughly 18,000 CPU cores at the target capacity. After the transformation, CPU demand dropped to about 7,000 cores (a 61% reduction). Subsequent algorithmic and caching optimizations further reduced demand to ~1,000 cores.

The approach preserves the ability to run each service independently for other tenants, allowing a hybrid microservice/monolith deployment model with minimal code changes.

General Recommendations

Expose functionality through Go interfaces to hide implementation details and enable seamless switching between RPC and in‑process calls.

Prefer dependency injection over heavy init logic to keep packages lightweight.

Keep each package focused on a single responsibility to avoid hidden coupling.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Backendcloud-nativeMicroservicesGomonolithtRPC
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.