Databases 19 min read

Unlocking MySQL 8.0 Optimizer: Cost Model Configuration and Histogram Usage

This article explains how MySQL 8.0 improves the optimizer by introducing configurable cost model constants and a histogram feature, showing how to query and update system tables, use ANALYZE TABLE to create and manage histograms, and explore the underlying code structures and future tuning possibilities.

dbaplus Community
dbaplus Community
dbaplus Community
Unlocking MySQL 8.0 Optimizer: Cost Model Configuration and Histogram Usage

Cost Model

MySQL historically used hard‑coded cost constants. Since MySQL 5.7 the constants are stored in the system tables mysql.server_cost and mysql.engine_cost, which can be inspected and modified at runtime.

SELECT * FROM mysql.server_cost;
UPDATE mysql.server_cost SET cost_value = 40 WHERE cost_name='disk_temptable_create_cost';
FLUSH OPTIMIZER_COSTS;

After FLUSH OPTIMIZER_COSTS the new values are used by newly created sessions; existing sessions keep the previous values until their reference count drops to zero.

The optimizer keeps a global cache Cost_constant_cache. During thread start‑up ( lex_start()) each thread copies a pointer to the cached Cost_model_constants (which contains Server_cost_constants and SE_cost_constants) into THD::m_cost_model. The cache is refreshed by reading the system tables or by FLUSH OPTIMIZER_COSTS. Reference counting guarantees that a cached version is not freed while still used by any thread.

Each TABLE object holds a TABLE::m_cost_model of type Cost_model_table. The optimizer stores the result of a cost estimation in a Cost_estimate structure with four dimensions:

double io_cost;    // I/O operations
double cpu_cost;   // CPU operations
double import_cost; // remote operations
double mem_cost;   // memory usage (bytes)

Histogram

MySQL 8.0 introduced column histograms, which allow the optimizer to estimate predicate selectivity more accurately. Histograms are created with the ANALYZE TABLE … UPDATE HISTOGRAM statement.

ANALYZE TABLE sbtest1 UPDATE HISTOGRAM ON k WITH 10 BUCKETS;

The metadata is stored in the hidden table mysql.column_statistics and exposed through the view INFORMATION_SCHEMA.COLUMN_STATISTICS. Example query:

SELECT JSON_PRETTY(HISTOGRAM)
FROM INFORMATION_SCHEMA.COLUMN_STATISTICS
WHERE SCHEMA_NAME='sb1' AND TABLE_NAME='sbtest1';

A bucket contains the lower and upper bounds, the cumulative frequency, and the distinct‑value count. For singleton (equally‑wide) histograms each bucket stores a single value and its cumulative frequency. The default number of buckets is 100; the WITH N BUCKETS clause overrides it. The variable histogram_generation_max_mem_size limits the memory used during histogram construction.

Histograms are not refreshed automatically. After heavy DML you must run ANALYZE TABLE … UPDATE HISTOGRAM again.

The optimizer uses histogram statistics for predicates such as =, <, >, BETWEEN, IN, IS NULL, etc., improving plan choice without requiring indexes.

Implementation details (MySQL 8.0.12)

Histogram code resides in sql/histograms/ (e.g., equi_height.cc, singleton.cc, value_map.cc). The class hierarchy is: histograms::Histogram – abstract base histograms::Equi_height – equi‑height histogram (template per data type) histograms::Singleton – singleton (equally‑wide) histogram

Command handling flow:

Sql_cmd_analyze_table::handle_histogram_command
  └─ update_histogram
       └─ histograms::update_histogram
            ├─ field type validation
            ├─ sampling (controlled by histogram_generation_max_mem_size)
            │   └─ ha_sample_init / ha_sample_next / ha_sample_end
            ├─ build_histogram
            │   ├─ Singleton::build_histogram  (if distinct values ≤ buckets)
            │   └─ Equi_height::build_histogram (otherwise)
            └─ Histogram::store_histogram → dd::Column_statistics

During query optimization the selectivity is obtained via:

get_histogram_selectivity
   └─ Histogram::get_selectivity
        ├─ get_equal_to_selectivity_dispatcher
        ├─ get_greater_than_selectivity_dispatcher
        └─ get_less_than_selectivity_dispatcher

Supported predicate types include equality, inequality, range, IS NULL/NOT NULL, BETWEEN, IN, and their negations.

References (URLs)

MySQL 5.7 Cost Model documentation: https://dev.mysql.com/doc/refman/5.7/en/cost-model.html

MySQL 8.0 ANALYZE TABLE syntax: https://dev.mysql.com/doc/refman/8.0/en/analyze-table.html

Histogram system variable: https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_histogram_generation_max_mem_size

Histogram statistics overview: https://mysqlserverteam.com/histogram-statistics-in-mysql/

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

databasemysqloptimizerCost ModelHistogram
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.