Operations 28 min read

Mastering Data Warehouse Naming: A Complete Guide to Standards and Processes

This article provides a comprehensive, step‑by‑step guide to data‑warehouse development, covering the full R&D workflow, data modeling layers, data dictionary creation, naming conventions for tables, columns, indexes and ETL jobs, metric standardization, and governance processes to ensure consistent, maintainable data assets across the organization.

Big Data Tech Team
Big Data Tech Team
Big Data Tech Team
Mastering Data Warehouse Naming: A Complete Guide to Standards and Processes

Data Warehouse Layering

The data warehouse is organized into logical layers to separate concerns and support scalable analytics:

ODS (Operational Data Store) : Ingests raw source data with minimal transformation to preserve lineage.

DWD (Data Warehouse Detail) : Mirrors ODS granularity while applying cleansing, standardisation and light enrichment.

DWM (Data Warehouse Middle) : Performs lightweight aggregations on DWD data to create reusable intermediate tables for common dimensions.

DWS (Data Warehouse Service) : Provides a slightly coarser‑grained, wide‑table layer that serves the majority of analytical use‑cases.

APP (Application) : Stores data required by downstream applications (e.g., Elasticsearch, PostgreSQL, Redis) for reporting or machine‑learning.

DIM (Dimension) : Contains high‑cardinality master data (e.g., user profiles, product catalogs) and low‑cardinality reference tables (e.g., enums, date dimensions).

Naming Conventions

All identifiers use lower‑case letters, numbers and underscores (snake_case). Names should be concise, meaningful, and no longer than 32 characters where possible.

Table Naming Patterns

ODS layer: ods_<source_type>_<business_table>_<load_strategy>_<load_cycle>
Example: ods.ods_db_logs_gold_logs_i_d
DWD layer: dwd_<primary_domain>_<secondary_domain>[_process]_<description>_<strategy>_<cycle>
Example: dwd.dwd_log_app_click_info_i_d
DWS layer: dws_<primary_domain>_<secondary_domain>_<granularity>_<description>_<period>
Example: dws.dws_log_mbr_event_info_1d

Column Naming

Columns start with the entity name and use underscores to separate words, e.g., channel_id, amount_usd, created_at.

Index Naming

Use a short prefix (e.g., idx_) followed by the table and column names, such as idx_user_created_at.

ETL Job Naming

Job names should describe the operation and may include a prefix ( load_, extract_, transform_) and an optional timestamp for versioning, e.g., load_user_daily_20230401.

Data Dictionary

A data dictionary records every data item, its type, enumeration set, source, and processing logic. It supports:

Accurate data collection and analysis.

Database design and lineage tracking.

Shared understanding across business and technical teams.

Typical contents include data items, structures, flow, storage, and processing steps.

Metric (Indicator) Standardisation

Metrics are organised into three levels:

Atomic metrics : Indivisible business measures (e.g., payment_amount).

Derived metrics : An atomic metric combined with qualifiers, a time window and an aggregation (e.g., payment_amount_7d_sum).

Composite metrics : Calculated from atomic and derived metrics using ratios, percentages, changes, etc.

Recommended naming pattern (all lower‑case, underscore‑separated):

business_segment_entity_timewindow_action_type_frequency_aggregation_metric
Example: ecommerce_user_7d_new_suborder_daily_avg_payment_amount

Metric Governance

Governance is typically handled by a small team (a data product manager and up to two engineers) responsible for:

Maintaining the metric dictionary.

Reviewing new metric requests.

Ensuring alignment across business lines.

Metrics are classified as:

Primary (core) metrics : Company‑wide, cross‑department indicators with strict review and delivery processes.

Secondary (business‑specific) metrics : Domain‑level indicators managed by the owning business unit.

Additional Technical Guidelines

Layered call order: ODS → DWD → DWS → APP. Reverse calls are discouraged to avoid circular dependencies.

Index design should include the table and column names, use a clear prefix (e.g., idx_), and avoid special characters.

ETL scripts must be idempotent, include comments for complex logic, and respect partition pruning and join optimisation hints.

Code should avoid DDL statements (DROP, CREATE, RENAME) within production jobs and enforce quality checks (DQC) before release.

R&D workflow diagram
R&D workflow diagram
metadataETLdata dictionarymetric governance
Big Data Tech Team
Written by

Big Data Tech Team

Focuses on big data, data analysis, data warehousing, data middle platform, data science, Flink, AI and interview experience, side‑hustle earning and career planning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.