Cloud Computing 10 min read

How to Slash Cold-Start Delays for Spring Boot on Serverless Platforms

This article, part of a series analyzing Serverless platforms for Spring Boot, explains how to diagnose and reduce cold‑start latency, covering tracing, reserved instances, lazy initialization, JVM options, and instance concurrency tuning, using the high‑traffic Mall demo application as a concrete example.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
How to Slash Cold-Start Delays for Spring Boot on Serverless Platforms

Spring Boot is a Java‑based framework that packages many Spring components, allowing developers to create standalone applications with minimal configuration. In cloud‑native environments, Serverless platforms can run Spring Boot apps, but cold‑start latency remains a challenge.

This is the fourth article in a series that evaluates Serverless platforms for Spring Boot from architecture, deployment, monitoring, performance, and security perspectives, using the popular open‑source e‑commerce project mall (over 50k GitHub stars) as a case study. The focus here is performance tuning for Serverless deployments.

Instance Startup Speed Optimization

Cold‑start latency can reach about 30 seconds for the Mall application, which is noticeable to users. Cold start consists of several stages:

Code Preparation (PrepareCode) : downloading the code package or image. With image acceleration enabled, this step is very fast.

Runtime Initialization (RuntimeInitialization) : from function start until the platform detects the application port is ready, including the Spring Boot startup time.

Application Initialization (Initialization) : custom initialization logic executed via the Function Compute Initializer interface.

Invocation (Invocation) : processing the request, typically a short delay.

1. Use Reserved Instances

Java applications start slowly, and many initialization steps involve external services, making it hard to reduce latency. Reserved instances keep a minimum number of instances warm, eliminating cold starts at the cost of continuous payment.

In the Function Compute console, configure reserved instances on the “Elastic Scaling” page, setting minimum and maximum instance counts, and optionally schedule or metric‑based reservations.

After creating a reservation, the system provisions the reserved instances, and subsequent function invocations avoid cold start.

2. Optimize Instance Startup Speed

Lazy Initialization

Spring Boot 2.2+ supports a global lazy‑initialization flag, which speeds up startup at the expense of a longer first request.

SPRING_MAIN_LAZY_INITIATIALIZATION=true

Disable Optimized Compiler

The JVM’s tiered JIT compilation improves long‑running performance but adds startup overhead. For short‑lived Serverless functions, disable it to reduce startup time.

JAVA_TOOL_OPTIONS="-XX:+TieredCompilation -XX:TieredStopAtLevel=1"

Set these environment variables in s.yaml and redeploy the function:

sudo -E s mall-admin deploy

Login to Verify Environment Variables

In the console’s request list, click “Instance Details” → “Login Instance”. Then run echo $SPRING_MAIN_LAZY_INITIATIALIZATION (or the other variable) to confirm the settings.

Note: For non‑reserved instances, the platform may reclaim the instance after a period of inactivity, making the “Login Instance” button unavailable. Invoke the function before the instance is reclaimed to log in.

Configure Reasonable Instance Parameters

When selecting an instance size (e.g., 2C4G or 4C8G), determine how many concurrent requests an instance can handle while fully utilizing resources and maintaining performance. Function Compute uses Instance Concurrency as the scaling metric.

Instance Concurrency defines the maximum number of simultaneous requests an instance can process (e.g., a setting of 20 means up to 20 concurrent requests).

The platform can quickly scale based on concurrency, avoiding the delay of aggregating CPU/Memory/Network metrics.

Concurrency reliably reflects load across various conditions, unlike QPS or latency which can be affected by downstream bottlenecks.

To determine an appropriate concurrency value:

Set the function’s maximum instance count to 1 and benchmark a single instance.

Run load‑testing tools to observe TPS and latency.

Gradually increase the concurrency setting; if performance remains good, continue increasing, otherwise decrease.

Reference URLs

Spring Boot: https://spring.io/projects/spring-boot
Mall: https://github.com/macrozheng/mall
Serverless Devs installation: http://serverlessdevs.com/zhcn/docs/installed/cliinstall.html
Function Compute: https://www.aliyun.com/product/fc
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Serverlessperformance tuningSpring Bootcold startreserved instancesInstance Concurrency
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.