Why ShardingJDBC Can Be a Trap: Rethinking Sharding‑Based Database Optimization
The article recounts a real‑world case where using ShardingJDBC, Elasticsearch and Logstash dramatically increased system complexity and caused a pagination bug, then walks through hardware, software and SQL‑level optimization techniques and suggests more balanced alternatives to blind sharding.
Background
A device table grew to 500‑600 million rows. Four years ago the team introduced sharding with ShardingJDBC, splitting the data into four databases and five tables. Queries were routed to Elasticsearch for convenience.
Complexity introduced
Without a sharding key a query would scan all databases and tables, generating up to 20 SQL statements. Adding Elasticsearch simplified the query path but required data synchronization. Logstash was used to sync data every 20 seconds, which is not real‑time and adds latency.
Pagination bug
A paginated query on the single device table always returned only the first page, while other tables paginated correctly. Investigation showed that the datasource for this table used the ShardingJDBC configuration, and ShardingJDBC does not support some SQL constructs, causing the pagination failure.
Two possible solutions were considered:
Upgrade ShardingJDBC to a version newer than 4.1, which is reported to fix the issue (not yet tried).
Separate the sharded datasource from the single‑table datasource, using a regular datasource for the latter, which restores correct pagination.
Hardware‑level optimization
Replace HDDs with SSDs for faster random reads/writes.
Use RAID 10 to increase storage bandwidth and redundancy.
Expand memory so the database can cache a larger portion of hot data.
Deploy multi‑core high‑frequency CPUs to support higher concurrency.
Analyze CPU utilization and eliminate CPU bottlenecks.
Optimize network latency and bandwidth; use 10 GbE NICs for distributed scenarios.
Software‑layer optimization
Adjust the InnoDB buffer pool size (e.g., innodb_buffer_pool_size = 64G) to cache most hot data.
Increase innodb_log_buffer_size to improve log write performance.
Deploy in‑memory caches such as Redis or Memcached.
For high‑concurrency workloads consider distributed databases (TiDB, MongoDB) that support read/write separation or data sharding.
Select appropriate index types (B+‑tree, hash) based on query patterns and regularly clean up redundant indexes.
Keep the database on the latest stable version to benefit from new optimization features and bug fixes.
SQL‑layer optimization
Avoid SELECT *; query only required columns.
Create indexes on columns used in ORDER BY and GROUP BY to accelerate sorting and grouping.
Split complex SQL statements into multiple simpler queries or views.
Replace large OFFSET pagination with index‑based conditions, e.g., WHERE id > last_seen_id.
Build indexes on fields frequently used in WHERE, JOIN, and GROUP BY clauses; avoid functions or implicit conversions on indexed columns.
Prefer small transactions to limit lock scope and choose appropriate transaction isolation levels to reduce waiting.
Use built‑in analysis tools such as MySQL EXPLAIN or SQL Server performance monitors to examine query plans.
Comprehensive measures
Regularly monitor the slow‑query log and optimize identified slow queries.
Archive rarely accessed historical data to cold storage to reduce load on the primary database.
Employ a connection pool (e.g., HikariCP) to reuse connections and lower overhead.
Reference MySQL configuration (MySQL 5.7.40)
[mysql]
no-auto-rehash
prompt ''
[mysqld]
super_read_only = 1
port = 7360
server_id = 123
basedir = /opt/mysql/base/5.7.40
datadir = /data/mysql/data/7360
log_bin = /opt/mysql/log/binlog/7360/mysql-bin
tmpdir = /opt/mysql/tmp/7360
relay_log = /opt/mysql/log/relaylog/7360/mysql-relay
innodb_log_group_home_dir = /opt/mysql/log/redolog/7360
log_error = /data/mysql/data/7360/mysql-error.log
report_host = xxx
binlog_error_action = ABORT_SERVER
binlog_format = row
binlog_rows_query_log_events = 1
log_slave_updates = 1
master_info_repository = TABLE
max_binlog_size = 250M
relay_log_info_repository = TABLE
relay_log_recovery = 1
sync_binlog = 1
gtid_mode = ON
enforce_gtid_consistency = 1
binlog_gtid_simple_recovery = 1
default_storage_engine = InnoDB
innodb_buffer_pool_size = 64G
innodb_data_file_path = ibdata1:1G:autoextend
innodb_file_per_table = 1
innodb_flush_log_at_trx_commit = 1
innodb_flush_method = O_DIRECT
innodb_io_capacity = 1000
innodb_log_buffer_size = 64M
innodb_log_file_size = 2G
innodb_log_files_in_group = 2
innodb_max_dirty_pages_pct = 60
innodb_print_all_deadlocks = 1
innodb_strict_mode = 1
innodb_max_undo_log_size = 4G
innodb_undo_log_truncate = 1
innodb_read_io_threads = 8
innodb_write_io_threads = 8
innodb_purge_threads = 4
innodb_buffer_pool_load_at_startup = 1
innodb_buffer_pool_dump_at_shutdown = 1
innodb_buffer_pool_dump_pct = 25
innodb_sort_buffer_size = 8M
innodb_buffer_pool_instances = 8
innodb_lock_wait_timeout = 10
innodb_io_capacity_max = 2000
innodb_flush_neighbors = 1
innodb_thread_concurrency = 64
innodb_stats_persistent_sample_pages = 64
innodb_autoinc_lock_mode = 2
innodb_online_alter_log_max_size = 1G
innodb_open_files = 4096
innodb_temp_data_file_path = ibtmp1:12M:autoextend:max:50G
innodb_rollback_segments = 128
key_buffer_size = 16M
tmp_table_size = 64M
max_heap_table_size = 64M
table_open_cache = 2000
query_cache_type = 0
query_cache_size = 0
max_connections = 3000
thread_cache_size = 200
open_files_limit = 65535
binlog_cache_size = 1M
join_buffer_size = 8M
sort_buffer_size = 2M
read_buffer_size = 8M
read_rnd_buffer_size = 8M
table_definition_cache = 2000
table_open_cache_instances = 8
slow_query_log = 1
slow_query_log_file = /data/mysql/data/7360/mysql-slow.log
log_slow_admin_statements = 1
log_slow_slave_statements = 1
long_query_time = 1
plugin_load = "rpl_semi_sync_master=semisync_master.so;rpl_semi_sync_slave=semisync_slave.so"
rpl_semi_sync_master_enabled = 1
rpl_semi_sync_slave_enabled = 0
rpl_semi_sync_master_wait_for_slave_count = 1
rpl_semi_sync_master_wait_no_slave = 0
rpl_semi_sync_master_timeout = 30000
session_track_schema = 1
session_track_state_change = 1
session_track_system_variables = '*'
log_timestamps = SYSTEM
lower_case_table_names = 1
max_allowed_packet = 64M
read_only = 1
skip_external_locking = 1
skip_name_resolve = 1
skip_slave_start = 1
socket = /data/mysql/data/7360/mysqld.sock
pid_file = /data/mysql/data/7360/mysqld.pid
disabled_storage_engines = ARCHIVE,BLACKHOLE,EXAMPLE,FEDERATED,MEMORY,MERGE,NDB
log-output = TABLE,FILE
character_set_server = utf8mb4
secure_file_priv = ""
performance-schema-instrument = 'wait/lock/metadata/sql/mdl=ON'
performance-schema-instrument = 'memory/% = COUNTED'
expire_logs_days = 7
max_connect_errors = 1000000
interactive_timeout = 1800
wait_timeout = 1800
log_bin_trust_function_creators = 1
slave-parallel-type = LOGICAL_CLOCK
slave_parallel_workers = 16
slave_preserve_commit_order = ON
slave_rows_search_algorithms = 'INDEX_SCAN,HASH_SCAN'
collation_server = utf8mb4_bin
explicit_defaults_for_timestamp = 1
transaction_isolation = READ-COMMITTEDConclusions
Sharding can add hidden complexity (additional services, synchronization latency, and SQL limitations). Alternative approaches include:
Splitting tables by business domain (e.g., user tables in a user database, order tables in an order database) to reduce pressure on a single database.
When data exceeds 10 billion rows, consider a distributed database that combines OLAP and OLTP capabilities, as MySQL is not ideal for massive analytical queries.
Archive historical data by time (daily or monthly tables) for workloads with large volumes but infrequent historical queries.
Each technique has trade‑offs; select the one that fits the specific scenario and avoid over‑design.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
