Data Fabric Architecture: Three Patterns, Core Technical Components, and Inherent Limitations
The article explains data fabric architecture as a promising approach for enabling data exchange across distributed systems, outlines its three design patterns, describes key technical components such as data virtualization, data catalog, and knowledge graphs, and discusses the trade‑offs, costs, and limitations that organizations must consider.
Data fabric architecture has become a way to facilitate data exchange among many different systems that support business.
Digital transformation is not only about digitizing workflows and processes; it also involves modernizing legacy and proprietary systems as well as isolated data sources to participate in an ecosystem that connects systems, applications, and services, essentially enabling data exchange among all resources that underpin core business workflows.
Data fabric has emerged as a promising solution to this problem. Data fabric is used to bind distributed resources together—whether they reside in the cloud or on‑premises—by exposing them through APIs for data exchange.
Like any technology, the Data Fabric architecture has its own advantages, disadvantages, costs, and benefits, which this article explores.
Data Fabric Architecture's Three Patterns
Broadly speaking, at least three popular concepts of data fabric architecture exist.
The first approach treats data fabric as a strictly decentralized architecture , a method of accessing originally distributed data without first consolidating it into a central repository such as a data lake or warehouse. In its mildest form, this approach no longer emphasizes centralized access; in its most radical form, it completely rejects the need for centralized access.
The second, more inclusive data fabric , regards centralized repositories as non‑privileged participants in a distributed data architecture: data in lakes or warehouses is exposed through the fabric just like any other source, allowing both centralized resources and distributed access.
The third view sees data fabric as the foundation of a hybrid data architecture . This model relies on data lakes and/or warehouses, leans toward centralized access, and provides architects a way to connect dispersed data resources while satisfying unpredictable access needs of specialized consumers such as data scientists or ML/AI engineers.
Technical Components of Data Fabric Architecture
Consequently, the term “data fabric” can be vague: it can be a generic concept for everyone or a very specific distributed data architecture. To fill this gap, we explore the core technologies that underpin a Data Fabric.
Data Virtualization
Data virtualization (DV) performs several useful functions.
First, it simplifies access to data resources regardless of their physical location, providing a virtual abstraction layer that makes cloud‑based resources appear as if they were in a local data center.
Second, DV supports API‑based universal access to distributed data resources , offering SQL as well as SOA, RESTful, and GraphQL endpoints, and allowing tools such as Java/JDBC, Python/ODBC, etc., to retrieve data.
Third, DV enables the creation and exposure of pre‑built data views , useful for common queries across dispersed resources, supporting both end‑user BI/reporting and expert practices like data science or ML engineering, effectively combining data integration, ETL, and data cleansing functions.
Data Catalog
Data cataloging uses metadata (data about data) to discover, identify, and classify useful data . When data lacks useful metadata, cataloging tools can generate it through techniques such as data profiling, determining whether data is customer, product, or sales data.
Advanced cataloging can also discover or generate additional metadata like data lineage (origin, transformations, timestamps, owners). Catalogs become essential for data discovery, allowing business analysts to query the catalog—often via natural language—to find sources ranging from applications and databases to CSV files, PDFs, PowerPoint files, or objects stored in services like Amazon S3.
Knowledge Graph
This is where the “magic” happens. Knowledge graphs identify and establish relationships between entities discovered across different data models, fitting them into evolving ontologies and generating interconnected entity patterns that can span domains.
For example, a knowledge graph can recognize that “CSTMR” and “CUST” refer to the same “CUSTOMER”, or link a formatted SSN to the entity “SSN” and associate it with a specific customer, even across SaaS sales/marketing apps, on‑prem data marts, and HR databases.
Inherent Limitations of Data Fabric Architecture
Proponents often present the best‑case view of data fabric, emphasizing simplified data access via abstraction, regardless of interface or location, and highlighting benefits of federated versus centralized access. However, the underlying technologies carry their own costs and trade‑offs.
Below are some of those limitations.
No Data History
Data fabric typically connects directly to operational systems (OLTP) that do not retain transaction history, overwriting existing records with new ones. Therefore, a DV platform must incorporate persistent storage to capture and manage historical transaction data, often resembling a data‑warehouse core. Yet, warehouses also store only derived subsets, not raw detailed data, which may be needed by analysts, data scientists, or ML engineers, prompting the need for a data‑lake‑like repository within the fabric.
Labor‑Intensive Aspects
In DV models, IT staff and expert users must configure various pre‑built connections for non‑expert users, exposing individual sources (e.g., SaaS finance, HR, sales apps) and maintaining pre‑built views for reports and dashboards, which involves complex data‑engineering pipelines for extraction, cleaning, and transformation.
Data catalog technologies also require manual search and discovery, as well as tools for experts to annotate, classify, and share data, often relying on human‑managed metadata despite automated dictionary generation.
Knowledge‑graph technology similarly needs human expert review and approval of discovered entities and relationships, especially for sensitive applications, because the discovery process is probabilistic.
Location Matters
Data fabric masks the physical location of distributed sources, but value emerges when data is integrated into useful combinations, a core function of SQL queries. Data‑warehouse architectures move integrated data to a central location for indexing, pre‑aggregation, and caching to accelerate queries.
In a data fabric, data is accessed from dispersed locations and physically moved into the DV platform for integration. Consequently, the DV platform must assume at least part of the warehouse’s role—caching, pre‑aggregating, and indexing—to support common queries. For ad‑hoc queries or analytics/ML models that require edge‑sensor data, caching is not possible, leading to latency, especially when high‑latency connections to edge sources are involved.
Key Points
These trade‑offs balance the positive benefits of data fabric. While they do not constitute insurmountable obstacles, they are considerations for potential adopters.
Another issue is the inherent bias of data fabric toward data access rather than data management. Gartner describes data fabric as an infrastructure for accessing and moving data, which is useful for distributed applications and digital transformation, but it does not replace data‑management practices.
Managing data goes beyond mere control; it involves preserving data history, optimizing performance for various workloads, and establishing reusable data flows, cleaning routines, and version‑control mechanisms. Viewing data fabric as a complementary rather than a replacement solution for data‑management tools and practices is more helpful.
Architects Research Society
A daily treasure trove for architects, expanding your view and depth. We share enterprise, business, application, data, technology, and security architecture, discuss frameworks, planning, governance, standards, and implementation, and explore emerging styles such as microservices, event‑driven, micro‑frontend, big data, data warehousing, IoT, and AI architecture.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.