How Chunked Transfer and Streaming Boost Backend Performance in ICBU

This article details how ICBU's core communication pages were accelerated by applying HTTP chunked transfer, streaming HTML rendering, Nginx tweaks, ThreadLocal handling, and caching strategies, resulting in FCP dropping from 2.6 s to 1.9 s and LCP from 2.8 s to 2 s.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
How Chunked Transfer and Streaming Boost Backend Performance in ICBU

Background and Results

ICBU's core communication scenario accumulated ten years of latency, prompting a performance overhaul. After coordinated front‑end and back‑end work, the 90th‑percentile First Contentful Paint (FCP) fell from 2.6 s to 1.9 s and Largest Contentful Paint (LCP) from 2.8 s to 2 s.

Measure 1: Streaming Chunked Transfer (Core)

HTTP Chunked Transfer Encoding

Chunked Transfer Encoding, defined in HTTP/1.1, lets a server send dynamically generated content without knowing the total size upfront. Data is split into chunks, each preceded by its length in hexadecimal and terminated by CRLF. A zero‑length chunk signals the end of the response.

Implementation

public void chunked(HttpServletRequest request, HttpServletResponse response) {<br/>    try (PrintWriter writer = response.getWriter()) {<br/>        response.setContentType(MediaType.TEXT_HTML_VALUE + ";charset=UTF-8");<br/>        response.setHeader("Transfer-Encoding", "chunked");<br/>        response.addHeader("X-Accel-Buffering", "no");<br/><br/>        // First part<br/>        Context modelMain = getmessengerMainContext(request, response, aliId);<br/>        flushVm("/velocity/layout/Main.vm", modelMain, writer);<br/><br/>        // Second part<br/>        Context modelSec = getmessengerSecondContext(request, response, aliId, user);<br/>        flushVm("/velocity/layout/Second.vm", modelSec, writer);<br/><br/>        // Third part<br/>        Context modelThird = getmessengerThirdContext(request, response, user);<br/>        flushVm("/velocity/layout/Third.vm", modelThird, writer);<br/>    } catch (Exception e) {<br/>        // logger<br/>    }<br/>}<br/><br/>private void flushVm(String templateName, Context model, PrintWriter writer) throws Exception {<br/>    StringWriter tmpWri = new StringWriter();<br/>    engine.mergeTemplate(templateName, "UTF-8", model, tmpWri);<br/>    writer.write(tmpWri.toString());<br/>    writer.flush();<br/>}

Streaming HTML Rendering

Our applications run on Spring MVC. The request flow is: browser → server (data preparation + Velocity rendering) → HTML response. By splitting the HTML into logical blocks and pushing them incrementally, the server reduces initial preparation time and the browser can start rendering as soon as the first chunk arrives.

Benefits of this approach:

Server prepares data in batches, shortening the first‑batch preparation time.

Browser receives data early and can render JavaScript immediately, improving resource utilization.

Key tips for splitting Velocity templates:

Respect resource dependencies; earlier sections must not rely on later variables.

Prioritize static and core resources in the first chunk so the server can quickly return the initial HTML.

Precautions

Filters or third‑party libraries that rewrite response headers can break chunked transfer. Retrieve the original HttpServletResponse to avoid interference:

private static HttpServletResponse getResponse(HttpServletResponse response) {<br/>    ServletResponse resp = response;<br/>    while (resp instanceof ServletResponseWrapper) {<br/>        resp = ((ServletResponseWrapper) resp).getResponse();<br/>    }<br/>    return (HttpServletResponse) resp;<br/>}

Chrome now blocks cross‑origin cookie writes; set SameSite=None for cookies used in iframes.

VelocityEngine custom tools must be initialized manually when rendering VM files in a streaming context.

@DefaultKey("assetsVersion")<br/>public class AssertsVersionTool extends SafeConfig {<br/>    public String get(String key) {<br/>        return AssetsVersionUtil.get(key);<br/>    }<br/>}

Nginx Configuration

Enable chunked transfer and disable buffering for the /chunked location:

server {<br/>    location ~ ^/chunked {<br/>        add_header X-Accel-Buffering no;<br/>        proxy_http_version 1.1;<br/>        proxy_cache off;<br/>        proxy_buffering off;<br/>        chunked_transfer_encoding on;<br/>        proxy_pass http://backends;<br/>    }<br/>}

Additional Nginx settings that may conflict with streaming output (e.g., SC_Enabled) need to be disabled, and buffer sizes should be tuned to avoid “upstream sent too big header” errors:

proxy_buffers 128 32k;<br/>proxy_buffer_size 64k;<br/>proxy_busy_buffers_size 128k;<br/>client_header_buffer_size 32k;<br/>large_client_header_buffers 4 16k;

Measure 2: Non‑Traffic Middleware Optimization

During peak traffic, configuration center calls were throttled. Switching from on‑demand getConfig calls to a push‑based model with local caching reduced latency.

public static void registerDynamicConfig(final String dataIdKey, final String groupName) {<br/>    IOException initError = null;<br/>    try {<br/>        String e = Diamond.getConfig(dataIdKey, groupName, DEFAULT_TIME_OUT);<br/>        if (e != null) {<br/>            getGroup(groupName).put(dataIdKey, e);<br/>        }<br/>        logger.info("Diamond config init: dataId=" + dataIdKey + ", groupName=" + groupName + "; initValue=" + e);<br/>    } catch (IOException e) {<br/>        logger.error("Diamond config init error: dataId=" + dataIdKey, e);<br/>        initError = e;<br/>    }<br/>    Diamond.addListener(dataIdKey, groupName, new ManagerListener() {<br/>        @Override<br/>        public Executor getExecutor() { return null; }<br/>        @Override<br/>        public void receiveConfigInfo(String s) {<br/>            String oldValue = (String) DynamicConfig.getGroup(groupName).get(dataIdKey);<br/>            DynamicConfig.getGroup(groupName).put(dataIdKey, s);<br/>            DynamicConfig.logger.warn("Receive config update: dataId=" + dataIdKey + ", newValue=" + s + ", oldValue=" + oldValue);<br/>        }<br/>    });<br/>    if (initError != null) {<br/>        throw new RuntimeException("Diamond config init error: dataId=" + dataIdKey, initError);<br/>    }<br/>}

Measure 3: Direct Data Output

Static images can be inlined as Base64 to avoid extra HTTP requests.

Heavy data required by JavaScript can be pre‑loaded on the server and sent together with the HTML, reducing client‑side processing.

Combining direct data output with chunked streaming keeps each response chunk manageable while mitigating server‑side response‑time growth.

Measure 4: Local Cache

For frequently accessed metadata (e.g., file owner nicknames), a simple HashMap cache of RPC results eliminates redundant remote calls.

Measure 5: Decommission Historical Debt

Legacy code such as abandoned experiment endpoints or unused Velocity variables adds tens to hundreds of milliseconds. Systematic cleanup—removing dead code, consolidating duplicated variables, and ensuring no production dependencies remain—yields measurable latency gains.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

BackendcachingNGINXSpring MVCperformance-optimizationchunked-transfer
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.