Big Data 15 min read

Berserker Big Data Platform: Architecture, Development Practices, and Operational Enhancements

This article presents a comprehensive overview of the Berserker big‑data platform, detailing its overall design, data‑development components, key architectural challenges such as state management, release processes, two‑phase commit, RPC duplication, task routing, message handling, execution isolation, dependency model redesign, and outlines future work including stateless execution nodes, Kubernetes integration, and unified stream‑batch processing.

DataFunTalk

May 23, 2024

Berserker Big Data Platform: Architecture, Development Practices, and Operational Enhancements

Platform Overview

The Berserker platform is a one‑stop data development and governance system built on big‑data ecosystem components, supporting data collection, transmission, storage, querying, development, analysis, mining, testing, execution, and operations for various internal roles.

Data Development

Core functionalities include offline batch scheduling, real‑time stream computing, ETL development, ad‑hoc queries, user APIs, and an operations center. The platform runs over 40+ micro‑services, using the Kratos framework for the micro‑service layer.

Architecture and Core Components

The scheduling system (project code‑named Archer) consists of Control Node (CN) for scheduling control, Execute Node (EN) for task execution, API services, SqIScan for SQL parsing, DataManager for IDC management, Blackhole for Kerberos authentication, and Admin console for configuration.

Key Challenges and Solutions

State Issues: Transitioned from Zookeeper and Redis to Raft for strong consistency, eliminating split‑brain and state loss problems.

EN Release Problems: Implemented a smooth release workflow that pauses task submission, waits for running tasks to finish, then resumes submission.

Two‑Phase Commit: Added explicit START_DISPATCH and END_DISPATCH states in Raft to ensure atomicity between state changes and RPC calls.

RPC Duplication: Introduced EN ACK mechanism and timeout‑based re‑checks to avoid duplicate task submissions.

Task Routing & Gray‑Release: Developed a rule‑engine‑driven routing system supporting 50+ attribute combinations, tag‑based machine/cluster selection, and gray‑release fallback.

Message Overload: Designed a SmartQueue with high/low watermarks and merge‑able messages to handle massive task‑status streams.

Execution Management: Adopted Docker containers with Dockerd and LogAgent for isolation, resource control, and unified logging.

Dependency Model Refactor: Replaced project‑level dependencies with root/end node task dependencies, enabling zero‑risk migration.

Big‑Data Operations: Built lightweight, fast‑response tools for incident handling, data repair, and batch re‑runs, supporting both real‑time and post‑mortem scenarios.

Future Work

Stateless EN: Move EN state to DockerD cache, simplifying releases.

Kubernetes Support: Migrate CN/EN functionalities to K8s for better resource utilization.

Unified Stream‑Batch Platform: Consolidate offline and real‑time processing under a common scheduling and execution framework.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Docker Big Data Distributed Scheduling Task Management Data Platform Raft

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.