Backend Development 29 min read

Optimization of Serialization in Search Recommendation Service

This report analyzes performance bottlenecks caused by serialization in a search‑recommendation system, presents detailed measurements of request latency, evaluates multiple optimization strategies—including Redis caching, lazy metric handling, and custom byte‑array serialization—and documents the resulting latency reductions and implementation considerations.

Zhuanzhuan Tech

Sep 4, 2024

Optimization of Serialization in Search Recommendation Service

1 Optimization Background

To improve overall engineering efficiency and service quality of the search‑recommendation system, the architecture was split into dedicated modules: central control, recall, and ranking services. The new ranking service added 10‑20 ms latency in micro‑detail page and search scenarios, mainly in the request‑response phase.

2 Problem Analysis

The large gap between caller wait time and service execution time points to remote‑call overhead, including serialization/deserialization, network I/O, and local calls. Measurements using the Skynet tracing tool show that response serialization accounts for most of the extra 4 ms latency.

Key metrics from Skynet:

Metric

Cost

Description

scf.request.serialize.cost

0.242 ms

Request serialization

scf.request.deserialize.cost

0.261 ms

Request deserialization

scf.response.deserialize.cost

0.624 ms

Response deserialization

scf.response.serialize.cost

≈0.624 ms

Response serialization (estimated)

The response phase consumes roughly half of the total latency, with serialization and network I/O each contributing significantly. Network conditions (gigabit NICs) and large object sizes (≈1 MB) further exacerbate the delay.

3 Design Solutions

3.1 Optimization Option 1

Eliminate log transmission by either printing logs directly to Kafka or caching logs in Redis. Direct logging would generate ~15 TB of daily data, which is impractical. Redis caching with a 1‑second TTL estimates 1 GB memory usage and is adopted.

Implementation introduces a IFutureConsumer<T> to decouple the prediction framework (producer) from the ranking framework (consumer), enabling asynchronous log handling.

interface IFutureConsumer<T> extends Consumer<T> {
    // accept(T t)
    Future<?> getFuture();
}

Ranking framework provides a HandleLogConsumer that processes logs asynchronously.

public class HandleLogConsumer<T> implements IFutureConsumer<T> {
    private Future<?> future;

    @Override
    public void accept(T t) {
        this.future = executorService.submit(() -> {
            // process log logic
        });
    }

    @Override
    public Future<?> getFuture() {
        return this.future;
    }
}

3.2 Optimization Option 2

Introduce a lazy metric type that stores log data as compressed byte arrays and only converts to String for the top‑N items, reducing decode operations from ~500 to ~10.

class RankResultItem {
    Map<String, LazyMetric> lazyMetricMap;
}

class LazyMetric {
    byte[] data; // encoded string
    byte compressMethodCode;

    LazyMetric(String str) {
        // string2bytes
    }

    String toString() {
        // bytes2string
    }
}

Performance tests show serialization time reductions up to 83 % and data size reductions to ~36 % of the original.

3.3 Optimization Option 3

Develop a custom byte‑array serialization framework, defining interfaces such as IRankObjSerializer<T> and implementing specialized serializers for primitives, maps, lists, sets, and ranking objects. Shared key‑sets across items further reduce overhead.

public interface IRankObjSerializer<T> {
    int estimateUsage(T obj, RankObjSerializeContext context);
    void serialize(T obj, RankObjSerializeContext context) throws Exception;
    T deserialize(RankObjDeserializeContext context) throws Exception;
}

Custom serializers achieve serialization times of 0.32 ms (vs. 1.86 ms) and data size of 392 KB (vs. 1.19 MB), an 84 % reduction.

4 Summary

The project successfully mitigated serialization‑induced latency in the search‑recommendation pipeline through three iterative solutions, ultimately adopting a custom serialization approach that cuts serialization overhead by ~83 % and network payload by ~67 %.

Key takeaways include the importance of thorough bottleneck analysis, avoiding premature solutions, and designing optimizations with a holistic view of the entire serialization process.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java Performance RPC Redis serialization Ranking custom serialization

Written by

Zhuanzhuan Tech

A platform for Zhuanzhuan R&D and industry peers to learn and exchange technology, regularly sharing frontline experience and cutting‑edge topics. We welcome practical discussions and sharing; contact waterystone with any questions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.