Databases 18 min read

Concern-Driven Data Architecture and Full-Scale Schema Modeling

The article explores a concern‑driven approach to data architecture, introducing a full‑scale schema modeling framework that balances data control, flexibility, push/pull delivery, and governance, and discusses multi‑level concerns, property‑graph standards, and practical scenarios for evolving schemas in complex business contexts.

Architects Research Society
Architects Research Society
Architects Research Society
Concern-Driven Data Architecture and Full-Scale Schema Modeling
Using Concern-Driven Data Architecture

It presents a unique opportunity to rebuild patterns, data models, and data architecture with a more thoughtful approach.

The real world presents contradictory concerns, such as whether schemas should be prioritized, deferred, or omitted, while some business requirements demand strict data design (e.g., financial compliance), and others benefit from agile, incremental schema evolution.

In 2019 a standard graph query language for property graphs emerged (see https://www.gqlstandards.org/home), raising the challenge of building a schema architecture suitable for most contexts, business situations, and development styles.

We first explore the foundations of modern data and information architecture as we set sail into the stormy seas of 2019.

Full-Scale Data Architecture

An informal European group called Full Scale Data Architects has made significant progress in integrating data architecture into today’s reality. One of the founders, Martijn Evers, states:

Ronald Damhof and I are trying to empower data architects with a new data reality and regain control. We have launched a full‑scale data architect movement to combat the growing data tsunami, defining ten commandments for aspiring full‑scale data architects.

Their meta‑architecture overview is illustrated below:

The four‑quadrant model combines two competing dimensions:

Data control vs. data flexibility

Data push (delivery) vs. data pull (consumption)

My personal “artistic impression” of the core elements is shown here:

The visual metaphor for the control‑flexibility dilemma is illustrated below:

To control the “bull” of data, you must hold both corners; shifting the route reduces either quality or flexibility.

Having identified the meta‑features of full‑scale data architecture, we ask what determines which concerns belong in which quadrant.

We trace this back to Edsger Dijkstra’s 1974 reminder about the importance of “separation of concerns.”

Multi‑Level Concern Architecture

Dijkstra, together with Peter Naur, worked on the European Algol‑60 project. Their emphasis on separating concerns informs today’s data architecture thinking.

Driven by technical and business needs, the number of new data/information modeling ideas, methods, and designs is exploding. Architects must master this dynamic. Rather than designing another Data Vault, Anchor Modeling, or fact‑based model, we should focus on the underlying concerns that drive these techniques.

We must make these issues public, understand their inter‑dependencies, and provide several “road‑maps” to address specific data delivery challenges.

In my 2016 book on NoSQL and SQL graph data modeling, I defined a comprehensive set of data‑modeling requirements, which I now merge with Martijn Evers’ concerns into three levels:

Business‑level concerns

Solution‑level (logical) concerns

Implementation concerns

In the context of schema design, note that property graphs (the upcoming GQL standard) align closely with the business‑concept level, meaning all three levels relate to graph schema design.

Most concerns naturally belong to one quadrant of the data‑quadrant matrix, though some span multiple quadrants.

All three levels also share certain “inherited traits.”

General Concerns

Business‑oriented terminology should dominate all representation levels.

Algebraic support across levels.

Schema‑first approach where high‑quality, governed, business‑approved definitions are required (Q1).

No‑schema approach for domains needing large amounts of meaningful, structured data yet to be discovered (Q3, Q4).

Refinement: models may have different abstraction levels, requiring a three‑dimensional matrix to represent semantic and abstraction differences (Q1, Q4).

Business‑Level Concerns

Enable business terminology (including definitions) (Q2).

Basic dependencies (conceptual, functional, structural) (Q1).

Business‑friendly heuristics and visualizations built on concept‑model examples (Q1, Q2).

Minimal‑effort business concept model – a short path from whiteboard to first cut schema (Q3).

Solution independence – concept‑level details should be derivable from solution‑level details (Q1).

Standard visual examples (Q1‑4).

Standard concept types (business objects) with example attribute values (Q1).

Standard relationship types with simple cardinality (Q1).

General data types (numeric, string, date, amount, etc.) (Q1).

Simple business rules expressed as text comments in the schema (Q1).

Solution‑Level Concerns

Platform independence – solution‑level details must be independent of storage platforms (Q1).

Solution derivation – solution architecture should be derived from business concept architecture, turning concepts into logical business objects and their attributes (Q1, Q4).

Incremental solution optimization – schemas should evolve iteratively with design decisions (Q1, Q3).

Graphs and sub‑graphs (including sets) (Q1‑4).

Uniqueness constraints such as business keys (Q1).

Identity support, closely tied to uniqueness, defined for identifiers and surrogate items (Q1).

Updatability – all functional dependencies must be semantically resolved, with identifiers in place (Q1).

Audit trail and lineage for schema control (Q1).

Temporal integrity (Q1, Q2).

Time‑series aspects (Q1, Q2).

Property‑graph types: generic nodes, business‑object nodes, multi‑type nodes, attributes, and named, directed relationships with precise cardinality (Q1).

Mandatory attributes (Q1).

Physical‑Level Concerns

Smart ingestion – implicit typing, loading generic node types without explicit pre‑defined schema (Q3, Q4).

Simple mapping of transformations – physical schema details should map back to solution‑level details easily (Q1).

Complete lineage – straightforward back‑trace from physical to solution to business concept models (Q1).

Constraint facilities to support solution‑architecture details (Q1).

Identity, uniqueness, and ordering index tools (Q1).

Temporal integrity support (Q1).

All of the above are considered preliminary bets that merit discussion.

Paths to the Final Schema

Integration Involves Many Concerns

Strict governance (Q1) encompasses many concerns—about two‑thirds of them. Dependencies among concerns are inevitable.

Only two concerns were not observed in Q1: “no‑schema” and “smart ingestion.” They have some relation to the strict‑governance philosophy.

Some concerns are “global”: set algebra, visualization paradigms, incremental refinement, graphs/sub‑graphs, and time‑series.

Concern Dependencies

A quick first‑round dependency map is shown below:

The following items are not tied to any prerequisite:

Business‑oriented terminology

Business terms

Simple mapping of transformations

Graphs and sub‑graphs

Platform independence

Refinement

Set algebra

Solution independence

Incremental refinement

Temporal integrity

Time‑series

“Prerequisites” mean that the schema‑designer must specify what each concern will handle.

I have likely omitted some items; time will tell.

Possible Scenarios for Using Schemas

We can now answer questions about how to use the upcoming property‑graph schema tool, as illustrated by the dependency diagram.

Can we use fewer schemas (no pre‑defined schema)? Yes, provided “smart ingestion” is in place.

Can we start with the schema? Yes.

The minimal requirements for a workable schema are the ability to specify schema details as property‑graph types; other areas can be covered by the schema language as needed.

Do we need to define a business‑term dictionary? Not necessarily; other questions do not require it.

How can we create a business‑concept model simply? By mapping to standard concept and relationship types, naming basic dependencies that become discriminators for attributes and relationships, and optionally providing business‑friendly heuristics such as visualizations.

Can we use a “schema‑last” approach? Yes—the design focuses on lifting schema details from the physical solution to the logical solution and then to the business‑level.

Handling Complexity and Contradiction

The forthcoming property‑graph schema standard is complex and contains many contradictory issues; it serves as a scapegoat to demonstrate the most important part of a comprehensive architecture. Starting from a four‑quadrant, full‑scale data‑architecture meta‑framework provides a solid architectural foundation that can act as a schema language across many contexts and development styles.

I thank Ronald Damhof, Martijn Evers, and other members of the full‑scale data‑architecture community for sharing their ideas and experiences.

Data ModelinggovernanceProperty Graphconcern-drivenschema architecture
Architects Research Society
Written by

Architects Research Society

A daily treasure trove for architects, expanding your view and depth. We share enterprise, business, application, data, technology, and security architecture, discuss frameworks, planning, governance, standards, and implementation, and explore emerging styles such as microservices, event‑driven, micro‑frontend, big data, data warehousing, IoT, and AI architecture.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.