Databases 13 min read

When Sharding Fails: Debugging Pagination Bugs and Optimizing MySQL for Massive Tables

The article recounts a real‑world case where a massive device table (over 500 million rows) suffered from sharding and Elasticsearch integration issues, describes a pagination bug caused by Sharding‑JDBC, and presents a comprehensive set of hardware, software, and SQL optimization techniques—including MySQL configuration examples—to improve performance.

Java Architect Handbook
Java Architect Handbook
Java Architect Handbook
When Sharding Fails: Debugging Pagination Bugs and Optimizing MySQL for Massive Tables

Story Background

Traditional database optimization advice often recommends sharding when a single table exceeds tens of millions of rows. The author’s colleague asked why a table with hundreds of millions of rows was not sharded, prompting a discussion about the pitfalls of sharding.

System Complexity

The system stores device data in a table that has grown to 500‑600 million rows. Four years ago the team introduced sharding with four databases and five tables, and later added Elasticsearch (ES) for queries. Data synchronization between MySQL and ES is handled by Logstash, which runs every 20 seconds, further increasing system complexity.

Pagination Issue

A bizarre bug appeared: pagination on a single device table always returned only the first page. The root cause was identified as Sharding‑JDBC not supporting the specific pagination SQL, leading to a failure to retrieve subsequent pages. Two possible solutions were suggested: upgrade Sharding‑JDBC to a version (≥ 4.1) that fixes the bug, or separate the data source for sharded tables from the single‑table data source.

Database Optimization Recommendations

1. Hardware Optimization

Use SSDs to replace HDDs for faster random read/write.

Increase storage bandwidth, e.g., RAID 10.

Expand memory to allow larger buffer pools and reduce I/O.

2. Software Optimization

Upgrade CPU to multi‑core high‑frequency processors and monitor CPU utilization.

Improve network latency and bandwidth, especially for distributed databases (e.g., 10 GbE NICs).

3. SQL Optimization

Configuration : Adjust InnoDB buffer pool size, innodb_log_buffer_size, and other MySQL parameters.

Distributed Architecture : Consider distributed databases (TiDB, MongoDB) for read/write separation or sharding.

Indexing : Choose appropriate index types, avoid redundant indexes, and index columns frequently used in WHERE, JOIN, and GROUP BY.

SQL Tuning : Avoid SELECT *, create indexes on ORDER BY/GROUP BY columns, split complex queries, replace large OFFSET pagination with WHERE id > last_seen_id.

Lock Reduction : Use short transactions, appropriate isolation levels.

Tools : Use EXPLAIN (MySQL) or equivalent tools to analyze query plans.

Connection Pooling : Deploy HikariCP or similar to reuse connections.

MySQL Configuration Example

[universe]
bakupdir = /data/mysql/backup/7360
iops = 0
mem_limit_mb = 0
cpu_quota_percentage = 0
quota_limit_mb = 0
scsi_pr_level = 0
mycnf = /opt/mysql/etc/7360/my.cnf
run_user = actiontech-mysql
umask_dir = 0750
umask = 0640
id = mysql-mt1cbg
group_id = mysql-test

[mysql]
no-auto-rehash
prompt = ''
#default-character-set = utf8mb4
#tee = /data/mysql_tmp/mysql_operation.log

[mysqld]
super_read_only = 1
port = 7360
server_id = 123
basedir = /opt/mysql/base/5.7.40
datadir = /data/mysql/data/7360
log_bin = /opt/mysql/log/binlog/7360/mysql-bin
tmpdir = /opt/mysql/tmp/7360
relay_log = /opt/mysql/log/relaylog/7360/mysql-relay
innodb_log_group_home_dir = /opt/mysql/log/redolog/7360
log_error = /data/mysql/data/7360/mysql-error.log
report_host = xxx

# BINLOG
binlog_error_action = ABORT_SERVER
binlog_format = row
binlog_rows_query_log_events = 1
log_slave_updates = 1
master_info_repository = TABLE
max_binlog_size = 250M
relay_log_info_repository = TABLE
relay_log_recovery = 1
sync_binlog = 1

gtid_mode = ON
enforce_gtid_consistency = 1
binlog_gtid_simple_recovery = 1

default_storage_engine = InnoDB
innodb_buffer_pool_size = 64G
innodb_data_file_path = ibdata1:1G:autoextend
innodb_file_per_table = 1
innodb_flush_log_at_trx_commit = 1
innodb_flush_method = O_DIRECT
innodb_io_capacity = 1000
innodb_log_buffer_size = 64M
innodb_log_file_size = 2G
innodb_log_files_in_group = 2
innodb_max_dirty_pages_pct = 60
innodb_print_all_deadlocks = 1
innodb_strict_mode = 1
innodb_max_undo_log_size = 4G
innodb_undo_log_truncate = 1
innodb_read_io_threads = 8
innodb_write_io_threads = 8
innodb_purge_threads = 4
innodb_buffer_pool_load_at_startup = 1
innodb_buffer_pool_dump_at_shutdown = 1
innodb_buffer_pool_dump_pct = 25
innodb_sort_buffer_size = 8M
innodb_buffer_pool_instances = 8
innodb_lock_wait_timeout = 10
innodb_io_capacity_max = 2000
innodb_flush_neighbors = 1
innodb_thread_concurrency = 64
innodb_stats_persistent_sample_pages = 64
innodb_autoinc_lock_mode = 2
innodb_online_alter_log_max_size = 1G
innodb_open_files = 4096
innodb_temp_data_file_path = ibtmp1:12M:autoextend:max:50G
innodb_rollback_segments = 128

# CACHE
key_buffer_size = 16M
tmp_table_size = 64M
max_heap_table_size = 64M
table_open_cache = 2000
query_cache_type = 0
query_cache_size = 0
max_connections = 3000
thread_cache_size = 200
open_files_limit = 65535
binlog_cache_size = 1M
join_buffer_size = 8M
sort_buffer_size = 2M
read_buffer_size = 8M
read_rnd_buffer_size = 8M
table_definition_cache = 2000
table_open_cache_instances = 8

# SLOW LOG
slow_query_log = 1
slow_query_log_file = /data/mysql/data/7360/mysql-slow.log
log_slow_admin_statements = 1
log_slow_slave_statements = 1
long_query_time = 1

# SEMISYNC
plugin_load = "rpl_semi_sync_master=semisync_master.so;rpl_semi_sync_slave=semisync_slave.so"
rpl_semi_sync_master_enabled = 1
rpl_semi_sync_slave_enabled = 0
rpl_semi_sync_master_wait_for_slave_count = 1
rpl_semi_sync_master_wait_no_slave = 0
rpl_semi_sync_master_timeout = 30000

# CLIENT_DEPRECATE_EOF
session_track_schema = 1
session_track_state_change = 1
session_track_system_variables = '*'

# MISC
log_timestamps = SYSTEM
lower_case_table_names = 1
max_allowed_packet = 64M
read_only = 1
skip_external_locking = 1
skip_name_resolve = 1
skip_slave_start = 1
socket = /data/mysql/data/7360/mysqld.sock
pid_file = /data/mysql/data/7360/mysqld.pid
disabled_storage_engines = ARCHIVE,BLACKHOLE,EXAMPLE,FEDERATED,MEMORY,MERGE,NDB
log-output = TABLE,FILE
character_set_server = utf8mb4
secure_file_priv = ""
performance-schema-instrument = 'wait/lock/metadata/sql/mdl=ON'
performance-schema-instrument = 'memory/% = COUNTED'
expire_logs_days = 7
max_connect_errors = 1000000
interactive_timeout = 1800
wait_timeout = 1800
log_bin_trust_function_creators = 1

# MTS
slave-parallel-type = LOGICAL_CLOCK
slave_parallel_workers = 16
slave_preserve_commit_order = ON
slave_rows_search_algorithms = 'INDEX_SCAN,HASH_SCAN'

collation_server = utf8mb4_bin
explicit_defaults_for_timestamp = 1
transaction_isolation = READ-COMMITTED

Conclusion

Sharding and Elasticsearch can introduce hidden complexity and bugs, such as pagination failures caused by unsupported SQL in Sharding‑JDBC. A balanced approach—careful hardware sizing, proper MySQL configuration, judicious indexing, and avoiding over‑engineered architectures—yields better performance and maintainability.

ElasticsearchshardingPerformance TuningMySQLdatabase optimizationPaginationSharding-JDBC
Java Architect Handbook
Written by

Java Architect Handbook

Focused on Java interview questions and practical article sharing, covering algorithms, databases, Spring Boot, microservices, high concurrency, JVM, Docker containers, and ELK-related knowledge. Looking forward to progressing together with you.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.