How Multi‑Tenant Big Data Cloud Solves Data Silos and Low‑Speed Transfers
This article examines how a cloud‑native big data platform with multi‑tenant architecture addresses data silos, manual data distribution, and slow transfer speeds, using a real‑world banking case to illustrate functional requirements, design patterns, and optimization strategies.
Introduction
The convergence of cloud computing, big data, and artificial intelligence has become a market hotspot. Cloud computing provides the underlying support for large‑scale data processing, accelerating application development and service innovation, while the growth of data volume and AI adoption drives new industry trends.
Traditional big data platforms suffer from three major issues:
Data silos : Independent databases built by different departments lead to inconsistent data, lack of unified standards, and difficulty tracing quality problems.
"Chimney" development : Separate teams develop services and applications independently, causing duplicated effort in security, operations, upgrades, and deployment.
High technical threshold : Deploying and operating big data and AI solutions is costly, making it hard for customers, developers, and data scientists to use them efficiently.
Many enterprises, especially large multi‑level organizations, seek an integrated data management and application development platform to enable high‑quality data exchange and sharing. At Arch Summit 2018, senior R&D engineer Li Guangyue from StarRing Technology shared his experience building a data sharing platform using StarRing's Transwarp Data Cloud (TDC).
TDC (Transwarp Data Cloud) is StarRing's next‑generation intelligent big data cloud platform that offers various cloud‑based data service solutions tailored to different industry needs.
Data Exchange Sharing Platform Requirements
A real‑world case from a large provincial bank service institution illustrates the needs:
Provide a unified infrastructure for over a hundred subsidiary legal entities to host data.
Support data distribution for an organization that only has data management rights, not ownership.
Integrate with the existing StarRing data warehouse product TDH, which already runs dozens of systems with heavy daily workloads.
Key pain points include:
Manual, low‑latency data distribution lacking flexibility and self‑service customization.
Strict isolation requirements between subsidiaries to prevent data leakage.
Insufficient permission control and audit mechanisms.
Subsidiaries still at an early stage of data utilization, lacking big data analytics support.
From these, the platform must provide the following functions:
Multi‑tenant support with complete isolation between tenants.
A unified data middle platform offering a data catalog.
Self‑service request submission, admin approval, and automated data exchange.
Bidirectional connectivity between the data middle platform and tenants while ensuring permission control.
Comprehensive audit capabilities.
Big Data Cloud Response to Multi‑User Scenarios
TDC combines cloud application features with massive data processing capabilities and convenient container deployment, making it highly suitable for implementing data sharing services with minimal logical design and component development.
(1) Multi‑Tenant Model Construction
The design emphasizes two key properties: isolation and sharing. TDC’s native multi‑tenant attribute maps each subsidiary to a separate tenant on the cloud platform. Tenants are fully isolated by default, with access limited to their own data. Permission control and data management are handled by unified services, while data sharing occurs through tenant‑initiated requests and subscription mechanisms.
(2) Data Sharing Exchange Architecture – Initial Exploration
Based on the multi‑tenant model, the data sharing exchange architecture consists of three parts (see the diagram below): the original TDH cluster storing all subsidiary data, the cloud platform tenant providing data services and task scheduling, and the individual subsidiary tenants.
Each component includes a security module for permission control, and trust relationships can be configured between security modules for cross‑tenant authentication.
To enable self‑service data requests, a metadata management component is deployed in the platform tenant, pulling data asset information from the TDH cluster’s message queue. Each tenant also runs a data catalog component that connects to the platform metadata service, exposing available data for request.
The workflow proceeds as follows: a regular user logs into TCC, searches the data catalog, selects desired data and sync frequency, and creates a data request ticket. The tenant admin reviews and approves the ticket, which is then forwarded to the platform admin. After platform approval, a data sharing task parses the ticket, extracts data from the TDH cluster, and writes it into the tenant’s distributed database, completing the data flow from the central TDH cluster to the subsidiary tenant.
How to Optimize
While the current architecture satisfies basic data sharing requirements, data transfer speed remains a bottleneck. The JDBC‑based data flow achieves roughly 5,000 rows per second, which is inadequate for billions‑row workloads.
Future optimization will focus on moving the data flow logic from the upper layer to the lower layer, fully leveraging the platform’s distributed architecture to increase throughput while maintaining secure isolation. The next article will detail the solutions to these challenges.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
StarRing Big Data Open Lab
Focused on big data technology research, exploring the Big Data era | [email protected]
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
