Databases 21 min read

How MySQL 8.0’s Data Dictionary Eliminates Metadata Redundancy and Boosts Performance

MySQL 8.0 replaces duplicated server‑level and engine‑level metadata with a unified data dictionary stored in InnoDB, introduces a two‑level cache (local and shared) built on templated hash maps, and provides atomic DDL operations, dramatically improving metadata consistency, performance, and management simplicity.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
How MySQL 8.0’s Data Dictionary Eliminates Metadata Redundancy and Boosts Performance

Background

Before MySQL 8.0, the server layer and storage engines each kept separate metadata, causing redundancy and possible inconsistency. Different engines stored metadata in various files (.FRM, .PAR, .OPT, .TRN, .TRG), making unified management impossible and DDL non‑atomic.

Overall Architecture

MySQL 8.0 introduces a unified data dictionary stored in InnoDB tables, providing atomic DDL and a single source of truth for both server and engine layers.

Metadata Representation in Memory and Engine

The data dictionary uses polymorphic C++ classes (e.g., Table_impl, Column_impl, Partition_impl) where interfaces are pure virtual and implementations are suffixed with _impl. Table_impl contains fields such as engine, comment, partition info, and collections of indexes, foreign keys, partitions, triggers, and check constraints.

class Table_impl : public Abstract_table_impl, virtual public Table {
  // Fields.
  Object_id m_se_private_id;
  String_type m_engine;
  String_type m_comment;
  // - Partitioning related fields.
  enum_partition_type m_partition_type;
  String_type m_partition_expression;
  String_type m_partition_expression_utf8;
  enum_default_partitioning m_default_partitioning;
  // References to tightly-coupled objects.
  Index_collection m_indexes;
  Foreign_key_collection m_foreign_keys;
  Foreign_key_parent_collection m_foreign_key_parents;
  Partition_collection m_partitions;
  Partition_leaf_vector m_leaf_partitions;
  Trigger_collection m_triggers;
  Check_constraint_collection m_check_constraints;
};

Column_impl and Partition_impl are defined similarly, exposing column attributes and partition details.

class Column_impl : public Entity_object_impl, public Column {
  // Fields.
  enum_column_types m_type;
  bool m_is_nullable;
  bool m_is_zerofill;
  bool m_is_unsigned;
  bool m_is_auto_increment;
  bool m_is_virtual;
  bool m_default_value_null;
  String_type m_default_value;
  // References to tightly-coupled objects.
  Abstract_table_impl *m_table;
};
class Partition_impl : public Entity_object_impl, public Partition {
  // Fields.
  Object_id m_parent_partition_id;
  uint m_number;
  Object_id m_se_private_id;
  String_type m_description_utf8;
  String_type m_engine;
  String_type m_comment;
  Properties_impl m_options;
  Properties_impl m_se_private_data;
  // References to tightly-coupled objects.
  Table_impl *m_table;
  const Partition *m_parent;
  Partition_values m_values;
  Partition_indexes m_indexes;
  Table::Partition_collection m_sub_partitions;
};

Multi‑Level Cache

To avoid repeated reads from persistent storage, MySQL implements a two‑level cache. The first level is a client‑local cache (Local_multi_map) using hash maps for fast repeated access within a thread. The second level is a shared global cache (Shared_multi_map) protected by a mutex and condition variable for concurrent threads.

template <typename T>
class Multi_map_base {
 private:
  Element_map<const T *, Cache_element<T>> m_rev_map;
  Element_map<typename T::Id_key, Cache_element<T>> m_id_map;
  Element_map<typename T::Name_key, Cache_element<T>> m_name_map;
  Element_map<typename T::Aux_key, Cache_element<T>> m_aux_map;
};

Shared_multi_map adds locking, miss handling, and element pooling.

class Shared_multi_map : public Multi_map_base<T> {
 private:
  static const size_t initial_capacity = 256;
  mysql_mutex_t m_lock;
  mysql_cond_t m_miss_handled;
  Free_list<Cache_element<T>> m_free_list;
  std::vector<Cache_element<T> *> m_element_pool;
  size_t m_capacity;
};

Cache Retrieval Process

Clients request metadata by name, which is looked up first in the local uncommitted/dropped registries, then in the committed registry, and finally in the shared cache. On a miss, the key is marked, the thread waits, and the storage adapter reads the DD tables to populate the caches.

// Get a dictionary object.
template <typename K, typename T>
bool Dictionary_client::acquire(const K &key, const T **object,
                                bool *local_committed,
                                bool *local_uncommitted) {
  // Lookup in registry of uncommitted objects
  T *uncommitted_object = nullptr;
  bool dropped = false;
  acquire_uncommitted(key, &uncommitted_object, &dropped);
  ...
  // Lookup in the registry of committed objects.
  Cache_element<T> *element = NULL;
  m_registry_committed.get(key, &element);
  ...
  // Get the object from the shared cache.
  if (Shared_dictionary_cache::instance()->get(m_thd, key, &element)) {
    DBUG_ASSERT(m_thd->is_system_thread() || m_thd->killed ||
                m_thd->is_error());
    return true;
  }
}

Auto_releaser

Auto_releaser is an RAII class that keeps DD cache objects alive in the local cache for the duration of its scope and releases them (decrementing reference counts) when it goes out of scope, handling nested calls via a linked list.

In‑place DDL Example

During an in‑place DDL, the client obtains the current Table_impl, creates a new DD object, drops the old definition, and stores the new one, using atomic DDL support when available. Commit or rollback clears uncommitted and dropped registries, while the storage engine handles transactional persistence.

{
  if (thd->dd_client()->drop(table_def)) goto cleanup2;
  table_def = nullptr;
  DEBUG_SYNC_C("alter_table_after_dd_client_drop");
  reset_check_constraints_alter_mode(altered_table_def);
  if ((db_type->flags & HTON_SUPPORTS_ATOMIC_DDL)) {
    if (thd->dd_client()->store(altered_table_def)) goto cleanup2;
  }
}
...
if (res)
  thd->dd_client()->rollback_modified_objects();
else
  thd->dd_client()->commit_modified_objects();

Conclusion

MySQL’s data dictionary eliminates the redundancy and inconsistency of the previous architecture, provides atomic DDL, and reduces storage and management costs. Its templated, multi‑level cache design offers high performance while exposing a simple client API for unified metadata access.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

metadatamysqlDDLdata dictionarycache architecture
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.