Big Data 11 min read

How WeDataSphere Builds a One‑Stop, Open‑Source Big Data Platform

This article outlines the motivations for building a comprehensive data platform, describes the measurement and tailoring approach, details WeDataSphere’s architecture—including DataSphere Studio and Apache Linkis middleware—and shares the open‑source roadmap and future vision for the platform.

ITPUB
ITPUB
ITPUB
How WeDataSphere Builds a One‑Stop, Open‑Source Big Data Platform

Motivation for Building a Data Platform

Enterprises need a unified infrastructure that can ingest, process, catalog, and visualize large‑scale, heterogeneous data. Such a platform enables dashboards, cockpit views, and predictive analytics for decision‑making while addressing rapid growth in data volume, variety, and the need to control data‑related costs.

Measure‑and‑Tailor Methodology

The construction follows a two‑step “measure‑and‑tailor” approach.

1. Measure – Capability Assessment

Current data‑management maturity is evaluated against industry standards such as DCMM , the Financial Industry Data Capability Guide , and DAMA . The assessment produces a radar‑chart score that reflects strengths and gaps in areas such as data cataloging, governance, and developer productivity.

2. Tailor – Defining Core Capability Modules

Based on the assessment, business requirements, and budget, the platform is scoped into six foundational modules:

Data Analysis

Data Governance

Machine Learning

Operations & Monitoring

Compute‑Storage

Data‑Platform Middleware

Each module can be satisfied by:

Adopting existing open‑source components.

Purchasing commercial solutions.

Developing custom in‑house tools.

The resulting product matrix guides incremental rollout to departmental users.

WeDataSphere Architecture and Open‑Source Strategy

WeDataSphere (WDS) is designed as a one‑stop, fully connected, financial‑grade data platform. Its architecture emphasizes:

Extreme connectivity and decoupling through a unified middleware layer.

Extensibility and high reuse of components.

Open‑source collaboration to attract community contributions and co‑development.

DataSphere Studio – Application Development & Management Framework

DataSphere Studio provides a graphical workflow UI that integrates data exchange, cleansing, analysis, quality checks, visualization, scheduling, and output. It is built on an AppConn plug‑in architecture that defines a three‑level protocol:

Connection Layer : Standardized REST, WebSocket, and JDBC adapters for engine access.

Integration Layer : Uniform metadata and lifecycle management for external data applications.

Execution Layer : Orchestration of tasks across heterogeneous engines.

This design enables rapid, simple integration of third‑party data applications into the platform.

Apache Linkis – Computing Middleware

Apache Linkis (incubating) serves as the generic middleware that resolves tight client‑server coupling and duplicated effort across front‑end tools and back‑end engines. Key features include:

Standard interfaces: REST, WebSocket, and JDBC for accessing engines such as Spark, Presto, and Flink.

Unified task orchestration and engine context sharing.

Governance capabilities: multi‑tenant isolation, resource control, high availability, and failure handling.

Extensibility: new engines can be plugged in without modifying existing applications.

Open‑Source Development and Community Model

The WDS project follows the Apache Way, encouraging “Community Over Code”. All components—including DataSphere Studio, Apache Linkis, and the surrounding toolset—are being released under open‑source licenses to enable joint development with other big‑data platform teams. The goal is to lower participation barriers, foster a collaborative ecosystem, and continuously evolve the platform through community feedback and contributions.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

middlewareData Platformopen sourceApache LinkisDataSphere Studio
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.