Big Data 19 min read

Implementing a Multi‑Tenant Lakehouse Data Platform for Real‑Time Analytics at a SaaS CRM Company

This article details how a SaaS CRM provider built a cloud‑native Lakehouse platform to support multi‑tenant real‑time analytics, describing data challenges, metadata‑driven architecture, virtual database design, query optimization, BI integration, AI readiness, migration steps, and the resulting performance and scalability gains.

DataFunSummit
DataFunSummit
DataFunSummit
Implementing a Multi‑Tenant Lakehouse Data Platform for Real‑Time Analytics at a SaaS CRM Company

Compared with traditional data warehouses and data lakes, Lakehouse combines the flexibility and scalability of a lake with the query performance and governance of a warehouse. The SaaS CRM vendor SalesEasy leverages Lakehouse to power real‑time analytics across its product line, enhancing user data experience.

Data Characteristics – The platform must support massive multi‑tenant workloads (over 200,000 tenants, including Fortune 500 customers), diverse industry‑specific requirements, dynamic schemas without DDL, high performance and stability, and complex permission management.

Solution Overview – SalesEasy built a shared‑resource, multi‑tenant data platform based on metadata. Key components include:

Entity metadata that unifies business objects and allows tenant‑specific extensions.

Multi‑modal storage: primary relational databases (PostgreSQL), caching (Redis), search indexes (Elasticsearch), and a Greenplum data warehouse for analytics.

Virtual database tables that map tenant‑level logical tables to shared physical tables.

Sharding and partitioning (database‑level and table‑level) to handle high concurrency.

Dynamic query planning and index creation per tenant using PostgreSQL partitioning.

These designs enable tenant‑transparent upgrades and decouple metadata from physical data.

Data Architecture – The architecture consists of a unified data service layer that exposes entity metadata‑driven APIs for CRUD, validation, calculations, logging, and integration. Underlying storage spans PostgreSQL, MySQL (for some metadata), Redis, Elasticsearch, and multi‑cloud object storage (S3/OSS) across Tencent Cloud, AWS, and Huawei Cloud. Middleware such as MyCAT supports sharding, and messaging systems handle asynchronous processing.

Data‑Intelligent Applications – The flagship product is an integrated BI system offering real‑time data sync, agile report creation, fine‑grained permission alignment with the CRM, and embedded analytics. Recent AI features include machine‑learning‑based lead scoring and action recommendations, with plans for large‑model capabilities like a commercial chatbot.

Problems & Bottlenecks – Challenges include frequent schema changes (≈8,000 tables), complex permission matrices, and the need for sub‑second interactive queries. Scaling Greenplum for large enterprise customers proved costly, prompting a move toward a more flexible solution.

Lakehouse Practice – After selecting a cloud‑native Lakehouse as the foundation, SalesEasy validated its interactive query performance, elastic scaling, and data consistency using synthetic workloads. Migration involved dual‑write strategies, schema verification, and gradual tenant‑by‑tenant cut‑over, now completed on Tencent Cloud and AWS.

Value Delivered – Real‑time sync latency improved from 15 minutes to under five minutes, query performance increased by over 30 %, and compute resources can now be elastically scaled. The platform also supports AI workloads on structured, semi‑structured, and document data, expanding analytical possibilities.

Q&A Highlights – Tenant isolation is achieved by adding a tenant ID to wide tables; dynamic schemas are handled via metadata‑driven mappings; data sync uses CDC streams; large‑scale drill‑down can leverage cube‑based solutions; and compatibility between Greenplum and Lakehouse is largely seamless, with minor function adjustments.

Overall, the Lakehouse‑based data platform provides a scalable, real‑time, AI‑ready foundation for multi‑tenant SaaS CRM services.

Big Datareal-time analyticsdata-platformmulti-tenantLakehouseSaaS CRM
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.