How WeDataSphere Builds a One‑Stop, Open‑Source Big Data Platform
This article outlines the motivations for building a comprehensive data platform, describes the measurement and tailoring approach, details WeDataSphere’s architecture—including DataSphere Studio and Apache Linkis middleware—and shares the open‑source roadmap and future vision for the platform.
Motivation for Building a Data Platform
Enterprises need a unified infrastructure that can ingest, process, catalog, and visualize large‑scale, heterogeneous data. Such a platform enables dashboards, cockpit views, and predictive analytics for decision‑making while addressing rapid growth in data volume, variety, and the need to control data‑related costs.
Measure‑and‑Tailor Methodology
The construction follows a two‑step “measure‑and‑tailor” approach.
1. Measure – Capability Assessment
Current data‑management maturity is evaluated against industry standards such as DCMM , the Financial Industry Data Capability Guide , and DAMA . The assessment produces a radar‑chart score that reflects strengths and gaps in areas such as data cataloging, governance, and developer productivity.
2. Tailor – Defining Core Capability Modules
Based on the assessment, business requirements, and budget, the platform is scoped into six foundational modules:
Data Analysis
Data Governance
Machine Learning
Operations & Monitoring
Compute‑Storage
Data‑Platform Middleware
Each module can be satisfied by:
Adopting existing open‑source components.
Purchasing commercial solutions.
Developing custom in‑house tools.
The resulting product matrix guides incremental rollout to departmental users.
WeDataSphere Architecture and Open‑Source Strategy
WeDataSphere (WDS) is designed as a one‑stop, fully connected, financial‑grade data platform. Its architecture emphasizes:
Extreme connectivity and decoupling through a unified middleware layer.
Extensibility and high reuse of components.
Open‑source collaboration to attract community contributions and co‑development.
DataSphere Studio – Application Development & Management Framework
DataSphere Studio provides a graphical workflow UI that integrates data exchange, cleansing, analysis, quality checks, visualization, scheduling, and output. It is built on an AppConn plug‑in architecture that defines a three‑level protocol:
Connection Layer : Standardized REST, WebSocket, and JDBC adapters for engine access.
Integration Layer : Uniform metadata and lifecycle management for external data applications.
Execution Layer : Orchestration of tasks across heterogeneous engines.
This design enables rapid, simple integration of third‑party data applications into the platform.
Apache Linkis – Computing Middleware
Apache Linkis (incubating) serves as the generic middleware that resolves tight client‑server coupling and duplicated effort across front‑end tools and back‑end engines. Key features include:
Standard interfaces: REST, WebSocket, and JDBC for accessing engines such as Spark, Presto, and Flink.
Unified task orchestration and engine context sharing.
Governance capabilities: multi‑tenant isolation, resource control, high availability, and failure handling.
Extensibility: new engines can be plugged in without modifying existing applications.
Open‑Source Development and Community Model
The WDS project follows the Apache Way, encouraging “Community Over Code”. All components—including DataSphere Studio, Apache Linkis, and the surrounding toolset—are being released under open‑source licenses to enable joint development with other big‑data platform teams. The goal is to lower participation barriers, foster a collaborative ecosystem, and continuously evolve the platform through community feedback and contributions.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
