Big Data 12 min read

How Serverless Functions Can Replace Traditional Kafka Data Pipelines for Lower Cost and Easier Scaling

This article explains how Tencent Cloud CKafka works, describes the challenges of traditional open‑source data‑flow solutions, and demonstrates a Serverless Function approach—complete with architecture diagrams and code examples—to achieve low‑cost, auto‑scaling Kafka‑to‑Elasticsearch pipelines.

Tencent Cloud Middleware

Aug 12, 2020

How Serverless Functions Can Replace Traditional Kafka Data Pipelines for Lower Cost and Easier Scaling

Introduction to Tencent Cloud CKafka

Tencent Cloud CKafka is a cloud‑native, fully compatible implementation of the open‑source Kafka engine, supporting versions 0.9, 0.10, 1.1.1, and 2.4.2. It runs on thousands of nodes, handles petabyte‑scale data, and provides tenant isolation, rate limiting, authentication, monitoring, fault‑tolerant failover, and multi‑AZ disaster recovery.

What Is Data Flow (Data Transfer)

In a CKafka deployment, massive inbound and outbound data streams constitute "data flow". Existing open‑source tools such as Logstash, Filebeat, Spark, and Flink can handle data ingestion, cleaning, filtering, aggregation, and persistence, but they often incur high learning, tuning, and scaling costs.

Kafka upstream‑downstream ecosystem diagram

Serverless Function as a New Data‑Flow Solution

Serverless Functions run as custom code snippets positioned between data ingestion and egress, replacing traditional open‑source components. They offer low learning curves, zero maintenance, automatic scaling, and pay‑per‑use pricing.

Implementing Kafka → Elasticsearch with Serverless Function

The example shows an event‑driven Serverless Function that receives CKafka messages, cleans, filters, formats them, and writes the result to Elasticsearch.

Source data format

{
    "version": 1,
    "componentName": "trade",
    "timestamp": 1595944295,
    "eventId": 9128499,
    "returnValue": -1,
    "returnCode": 101103,
    "returnMessage": "return has no deal return error[错误：缺少**c参数][seqId:u3Becr8iz*]",
    "data": [],
    "seqId": "@kibana-highlighted-field@u3Becr8iz@/kibana-highlighted-field@*"
}

Target data format

{
    "timestamp": "2020-07-28 21:51:35",
    "returnCode": 101103,
    "returnError": "return has no deal return error",
    "returnMessage": "错误：缺少**c参数",
    "requestId": "u3Becr8iz*"
}

The Function logic parses the source JSON, filters out records with returnCode == 0, converts the Unix timestamp, extracts error information, and builds the target object before bulk‑indexing it into Elasticsearch.

#!/usr/bin/python
# -*- coding: UTF-8 -*-
from datetime import datetime
from elasticsearch import Elasticsearch, helpers
import json

esServer = "http://172.16.16.53:9200"
esUsr = "elastic"
esPw = "PW123"
esIndex = "pre1"

es = Elasticsearch([esServer], http_auth=(esUsr, esPw), sniff_on_start=False,
                 sniff_on_connection_fail=False, sniffer_timeout=None)

def convertAndFilter(sourceStr):
    target = {}
    source = json.loads(sourceStr)
    if source["returnCode"] == 0:
        return
    dateObj = datetime.fromtimestamp(source["timestamp"])
    target["timestamp"] = dateObj.strftime("%Y-%m-%d %H:%M:%S")
    target["returnCode"] = source["returnCode"]
    message = source["returnMessage"].split("][")
    errorInfo = message[0].split("[")
    target["returnError"] = errorInfo[0]
    target["returnMessage"] = errorInfo[1]
    target["requestId"] = message[1].replace("]", "").replace("seqId:", "")
    return target

def main_handler(event, context):
    for record in event["Records"]:
        target = convertAndFilter(record)
        action = {"_index": esIndex, "_source": {"msgBody": target}}
        helpers.bulk(es, action)
    return "successful!"

This concise script mirrors the core transformation performed by larger distributed frameworks; the difference lies in the platform handling scheduling, fault tolerance, and scaling automatically.

Advantages Over Open‑Source Solutions

Learning cost: minimal, as developers write familiar languages (Python, Java, Go, Node.js, etc.).

Operational cost: no need to maintain clusters, containers, or tuning parameters.

Scaling: automatic, fine‑grained concurrency driven by Kafka traffic, with pay‑per‑execution billing.

Comparison of Serverless Function vs. open‑source solutions

Underlying Execution Model

Serverless Functions are triggered by events (e.g., a Kafka message), scheduled, or invoked manually. The platform launches containers on demand, automatically determines concurrency based on incoming message volume, and charges only for execution time. Developers focus solely on bug‑free function code.

Future Outlook for Batch Computing

As streaming and batch processing converge, Serverless Functions provide a flexible, near‑infinite horizontal scaling option for batch workloads, fitting naturally into Lambda‑style architectures and supporting future batch‑stream integration.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Serverless Big Data data pipeline Elasticsearch kafka CKafka Function as a Service

Written by

Tencent Cloud Middleware

Official account of Tencent Cloud Middleware. Focuses on microservices, messaging middleware and other cloud‑native technology trends, publishing product updates, case studies, and technical insights. Regularly hosts tech salons to share effective solutions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Introduction to Tencent Cloud CKafka

What Is Data Flow (Data Transfer)

Serverless Function as a New Data‑Flow Solution

Implementing Kafka → Elasticsearch with Serverless Function

Advantages Over Open‑Source Solutions

Underlying Execution Model

Future Outlook for Batch Computing

Tencent Cloud Middleware

How this landed with the community

Was this worth your time?

0 Comments

Implementing Kafka → Elasticsearch with Serverless Function