Operations 13 min read

How I Boosted FunTester QPS by 14% and Halved Memory Usage

After a weekend of code refactoring, asynchronous processing, and removing unnecessary statistics, the author increased FunTester's QPS from 104,375 to 118,904 (≈13.9% gain), reduced memory consumption by over 57%, and documented detailed performance impacts of various optimizations with code samples and benchmark tables.

FunTester

Jul 27, 2021

Goal

The aim was to close the performance gap of the FunTester load‑testing framework after an initial benchmark showed it lagging behind K6 and Gatling in both memory consumption and queries per second (QPS).

Initial Benchmark

Baseline measurements (CPU usage, memory usage, QPS, response time):

K6 – CPU 718.74, Memory 370 MB, QPS 75 980, RT 1 ms

Gatling – CPU 585.97, Memory 350 MB, QPS 113 355, RT 1 ms

FunTester – CPU 528.03, Memory 1 770 MB, QPS 104 375, RT 1 ms

Three improvement actions were planned:

Convert non‑essential processing to asynchronous execution.

Replace the test‑metadata storage mechanism.

Gradually remove business‑related compatibility code.

Optimizations Applied

Removed all heavy statistics and calculation code, keeping only the total run count and total execution time.

Disabled response body parsing (which previously kept the full HTTP response as a String) while preserving the method stub for future verification.

Changed the data type used for response‑time statistics from int to short to reduce memory footprint.

Test Code

The simplified test uses the com.funtester.base.constaint.FixedThread template instead of the older ThreadLimitTimesCount implementation:

import com.funtester.base.constaint.FixedThread;
import com.funtester.base.constaint.ThreadBase;
import com.funtester.config.Constant;
import com.funtester.frame.execute.Concurrent;
import com.funtester.httpclient.ClientManage;
import com.funtester.httpclient.FunLibrary;
import org.apache.http.client.methods.HttpRequestBase;

class Share2 extends FunLibrary {
    public static void main(String[] args) {
        ClientManage.init(10, 5, 0, "", 0);
        String url = "http://localhost:12345/tps";
        Constant.RUNUP_TIME = 0;
        def get = getHttpGet(url);
        def task = new FunTester(get);
        new Concurrent(task, 30, "Local fixed‑QPS test").start();
        testOver();
    }
    private static class FunTester extends FixedThread<HttpRequestBase> {
        FunTester(HttpRequestBase request) {
            super(request, 50000, true);
        }
        @Override
        protected void doing() throws Exception {
            FunLibrary.executeOnly(request);
        }
        @Override
        ThreadBase clone() {
            return new FunTester(request);
        }
    }
}

Modified Load‑Test Template

The run method of FixedThread was stripped of all commented‑out statistics. The essential loop now looks like this:

public abstract class FixedThread<F> extends ThreadBase<F> {
    @Override
    public void run() {
        try {
            before();
            long start = Time.getTimeStamp();
            while (true) {
                executeNum++;
                doing();
                long now = Time.getTimeStamp();
                if ((now - start) / 1000 > RUNUP_TIME + 3) {
                    logger.info("Thread:{}, exec:{}, errors:{}, elapsed:{} s",
                        threadName, executeNum, errorNum, (now - start) / 1000.0);
                }
                // Statistics collection is omitted here for the experiment
                if (executeNum >= limit) break;
            }
        } catch (Exception e) {
            logger.warn("Task execution failed!", e);
        } finally {
            after();
        }
    }
    // other methods omitted for brevity
}

Real‑World Measurements

Monitoring Overhead

Using macOS Activity Monitor and jvisualvm, the monitoring tools themselves consumed >1 GB of heap, while the actual test kept memory below 300 MB. After the optimizations, FunTester measured:

CPU 558.71

Memory 741.9 MB

QPS 117 123

Exception Handling Impact

Re‑enabling the try‑catch block in FixedThread#run produced the following results (no exceptions were thrown):

CPU 532.26

Memory 684.8 MB

QPS 118 400

Conclusion: exception handling overhead is negligible when the error rate is low.

Statistics Collection Overhead

When full per‑request timing statistics were restored, memory rose sharply while QPS remained similar:

CPU 558.71

Memory 961.6 MB (≈+120 MB)

QPS 117 054

The additional heap usage is caused by the objects that store each request’s response time.

Response Parsing Overhead

Re‑introducing the original response‑parsing method, which converts the HTTP entity to a String, yielded:

CPU 562.79

Memory 1.25 GB

QPS 110 501

Keeping the full response in memory significantly increases heap consumption, while the impact on QPS is modest.

Summary of Results

QPS improved from 104 375 to approximately 118 904, a gain of ~13.9 %.

Peak memory usage dropped from 1 770 MB to the 700–800 MB range (≈57 % reduction).

Exception handling adds negligible overhead when errors are rare.

Full statistics collection adds ~120 MB of heap usage without affecting QPS.

Parsing the full HTTP response inflates memory to >1 GB and slightly reduces QPS.

Memory consumption of 1 million integers

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

java Performance Testing Benchmarking Memory Usage FunTester QPS optimization

Written by

FunTester

10k followers, 1k articles | completely useless

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.