Cloud Computing 10 min read

Transform a Gene Comparison Script into a Scalable Serverless Workflow

This article demonstrates how to convert a Python gene‑comparison program into a serverless solution using UCloud General Compute, redesigning I/O for stdin/stdout, leveraging Docker images, and employing high‑concurrency HTTP requests to cut processing time from days to minutes.

UCloud Tech
UCloud Tech
UCloud Tech
Transform a Gene Comparison Script into a Scalable Serverless Workflow

Serverless is an emerging architecture that lets developers focus on code without managing operations, resources, or deployment. This guide rewrites a Python gene‑comparison script to illustrate Serverless benefits.

Existing Resources

1. A mature gene‑comparison algorithm (Python, 2 seconds per run). 2. 2,020 gene sample files (2 MB each). 3. An 8‑core cloud host.

Sample Structure

├── relation.py
└── samples
    ├── one.sample
    └── two.sample

Original relation.py

import sys

def relationship_algorithm(human_sample_one, human_sample_two):
    # it's a secret
    return result

if __name__ == "__main__":
    length = len(sys.argv)
    if length != 3:
        sys.stderr.write("Need two samples")
    else:
        with open(sys.argv[1], "r") as sample_one:
            sample_one_list = sample_one.readlines()
        with open(sys.argv[2], "r") as sample_two:
            sample_two_list = sample_two.readlines()
        print relationship_algirithm(sample_one_list, sample_two_list)

Running the script locally with two samples yields a probability of relationship (e.g., 0.054).

Business Requirement

Process 2,000 people searching for children and 20 people searching for fathers, resulting in 40,000 comparisons. Serial execution would take about 56 days; with 8‑core parallelism, it drops to 7 days, still too slow.

Introducing UGC (UCloud General Compute)

UGC allows packaging compute‑intensive algorithms as Docker images. The image is pushed to a repository, pre‑pulled to many compute nodes, and invoked via two request styles: query‑string parameters for image name and token, and HTTP body for input data.

When a specially crafted HTTP request reaches UGC, a scheduler selects a node with the image, runs the container, feeds the request body to stdin, and returns the algorithm’s stdout and stderr as a tar archive in the response body.

Adapting the Algorithm for Serverless

Change input to read from stdin instead of files.

Write output to stdout instead of returning a value.

import sys
mystdin = sys.stdin.read()
# split mystdin into two samples as needed
# after processing
sys.stdout.write(result)

Client Development

The client builds an HTTP request, sends the combined sample data, and extracts the tar‑packed result.

import tarfile, io

def untar(data):
    tar = tarfile.open(fileobj=io.BytesIO(data))
    for member in tar.getmembers():
        f = tar.extractfile(member)
        with open('result.txt','a') as resultf:
            resultf.write(f.read())

Using itertools.product to generate the 40,000 sample pairs and gevent for a 200‑coroutine pool, the I/O‑bound request submission becomes highly concurrent.

import gevent.pool, gevent.monkey
gevent.monkey.patch_all()
pool = gevent.pool.Pool(200)
pool.map(worker, all)

Performance Gains

Each comparison still costs 2 seconds of CPU time. With 40,000 tasks, total CPU time is 80,000 seconds. At 200 concurrent submissions, the wall‑clock time drops to about 400 seconds (≈ 7 minutes), a dramatic improvement from 7 days.

Serverless Benefits Retained

Zero Operations – no servers to manage.

High Availability – tasks run on many nodes; failure of one node does not affect results.

Pay‑as‑You‑Go – cost is based on actual CPU seconds (≈ 5 CNY for the whole run).

Easy Deployment – Docker images are language‑agnostic and can be versioned via the image name.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Performance OptimizationPythongene analysis
UCloud Tech
Written by

UCloud Tech

UCloud is a leading neutral cloud provider in China, developing its own IaaS, PaaS, AI service platform, and big data exchange platform, and delivering comprehensive industry solutions for public, private, hybrid, and dedicated clouds.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.