Turning tRPC‑Go Microservices into a High‑Performance Monolith
This article explains how a large‑scale recommendation system built with tRPC‑Go microservices was refactored into a single‑process monolith to cut network overhead, reduce CPU usage by over 60%, and retain the benefits of microservice development while minimizing code changes.
Background
Microservices are the default deployment model for cloud‑native applications, typically running each service in its own Kubernetes pod. While this approach offers low coupling, independent iteration, and clear DDD boundaries, it also introduces significant network and serialization overhead, especially for high‑throughput services such as a recommendation system.
Advantages and Drawbacks of Microservices
Reduced coupling between modules; changes affect only the targeted service.
Smooth updates without downtime, suitable for large teams.
Clear input/output contracts that align with DDD.
Easy troubleshooting by isolating failing modules.
However, as the number of services grows, the system becomes harder to understand, RPC adds latency and traffic, service‑mesh governance incurs deployment cost, and multi‑tenant isolation becomes more complex.
Motivation for a Monolith Transformation
In performance tests of the Feeds Rerank pipeline, the upstream flow services (business → split → rerank → rank → recall) generated massive network I/O: business traffic reached 38 Gbps, split 81 Gbps, and rerank 44 Gbps. The split service, acting as a transparent pass‑through, spent most of its CPU on GC, serialization, and RPC rather than core logic.
Observing flame graphs confirmed that less than 10 % of CPU time was spent on actual business computation.
Proposed Solution: Mock RPC Calls as In‑Process Function Calls
tRPC‑Go generates Go interfaces for both server and client sides. By providing a proxy API that returns a client implementation backed by an in‑process mock, the same interface can be used both for real RPC calls and for direct function calls when the services run in the same process.
Code Generation
service FeedsRerank {
rpc GetFeedList (GetFeedRequest) returns (GetFeedReply) {}
}Running trpc build produces xxx.trpc.go containing:
type FeedsRerankService interface {
GetFeedList(ctx context.Context, req *GetFeedRequest) (*GetFeedReply, error)
}The client side gets:
type FeedsRerankClientProxy interface {
GetFeedList(ctx context.Context, req *GetFeedRequest, opts ...client.Option) (*GetFeedReply, error)
}Typical usage:
client := pb.NewFeedsRerankClientProxy()
resp, err := client.GetFeedList(ctx, req)Client‑Side Mock
We create a proxy struct that forwards calls to the real implementation:
type rerankProxy struct { impl *rerankImpl }
func (r *rerankProxy) GetFeedList(ctx context.Context, req *pb.GetFeedRequest, opts ...client.Option) (*pb.GetFeedReply, error) {
rsp := &pb.GetFeedReply{}
err := r.impl.GetFeedList(req, rsp)
return rsp, err
}
func (impl *rerankImpl) mockProxy() {
r := &rerankProxy{impl: impl}
proxyAPI.RegisterFeedsRerank(r)
}Server‑Side Registration
The service implementation is registered as usual:
pb.RegisterFeedsRerankService(server, rerankImpl)When the same process also registers a mock via the proxy API, the client call resolves to the in‑process implementation instead of performing an RPC.
Proxy API Implementation
package proxyapi
type API interface {
FeedsRerank() pb.NewFeedsRerankClientProxy
RegisterFeedsRerank(p pb.NewFeedsRerankClientProxy)
}
func DefaultAPI() API { return defaultAPIImpl }
type apiImpl struct {
internalFeedsRerankClientProxy pb.FeedsRerankClientProxy
internalXxxxClientProxy pb.XxxxClientProxy // other services
}
var defaultAPIImpl = new()
func new() *apiImpl {
return &apiImpl{
internalFeedsRerankClientProxy: pb.NewFeedsRerankClientProxy(),
internalXxxxClientProxy: pb.NewXxxxClientProxy(),
}
}
func (a *apiImpl) FeedsRerank() pb.NewFeedsRerankClientProxy { return a.internalFeedsRerankClientProxy }
func (a *apiImpl) RegisterFeedsRerank(p pb.NewFeedsRerankClientProxy) {
if p != nil { a.internalFeedsRerankClientProxy = p }
}A go generate script produces this boilerplate automatically, reducing manual duplication.
Deployment Configuration
The monolith still exposes the original microservice endpoints via the trpc_go.yaml configuration. Only the business‑layer service needs to be registered externally; other internal services can safely register with empty configurations because tRPC silently ignores missing entries.
Cost Reduction Results
Before refactoring, the five services of the recommendation system required an estimated 18 000 CPU cores. After consolidating them into a single process without changing business logic, the requirement dropped to ~7 000 cores—a 61 % reduction. Further algorithmic and caching optimizations later brought the total down to ~1 000 cores.
Practical Takeaways
Expose functionality through Go interface s so that the same contract works for RPC or in‑process calls.
Prefer dependency injection over heavy init logic to keep components interchangeable.
Keep packages small and focused to simplify mocking and future refactoring.
The approach works with other Go frameworks (Gin, gRPC) and demonstrates that a low‑overhead monolith can coexist with a microservice development workflow, achieving “the best of both worlds”.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
