Choosing Between Driver‑Level and Proxy‑Level Database Middleware for Sharding
This article explains the different layers—encoding, framework, driver, proxy, and implementation—through which database sharding can be introduced, compares driver‑level and proxy‑level middleware, outlines their characteristics, common constraints, and provides a step‑by‑step process for planning, preparing, and executing a sharding project.
Sharding Entry Levels
Scope is limited to JAVA and MySQL. The article first outlines the layers where sharding can be introduced.
1. Encoding Layer
Multiple data sources are created in a single project and routed with if / else statements. Spring provides AbstractRoutingDataSource for dynamic routing. This approach works for small projects but requires extensive code and becomes unmanageable with cross‑database queries.
2. Framework Layer
Suitable when an ORM framework can be unified, but often unrealistic. It involves extending the ORM or adding custom SQL hints. Implementing interceptors (e.g., MyBatis Interceptor) can control data flow, though it changes existing programming habits and may require modifying framework source code, which is not recommended.
3. Driver Layer
Addresses the drawbacks of the previous layers by rewriting a JDBC driver that maintains an in‑memory routing table and forwards requests to actual database connections. Typical implementations include TDDL and ShardingJDBC. It also covers MySQL Connector/J failover protocols such as load balancing, replication, and sharding.
4. Proxy Layer
The middleware pretends to be a database, accepts client connections, and routes or forwards requests. Examples are MySQL Router and MyCat. This layer supports only one backend DB type but multiple programming languages.
5. Implementation Layer
Mentions specialized DB clusters (MySQL Cluster, MariaDB Galera, Greenplum) that provide built‑in sharding features. Storage changes are beyond the article’s scope.
Driver vs Proxy Comparison
Driver Layer Characteristics
Only supports Java but works with many relational databases. Requires many connections—e.g., ten databases need ten Connection objects per instance, potentially leading to connection explosion.
Data aggregation (e.g., count, sum) is performed in the application memory after multiple queries.
Routing tables reside in the application memory and are updated via polling or notifications.
Centralized configuration simplifies operations and DBA management.
Proxy Layer Characteristics
Supports multiple languages but only one backend DB type. Requires managing a separate service for high availability, increasing operational overhead.
Acts as the sole entry point; high stability is critical because heavy aggregation queries can crash the node.
Common Points
Both layers have feature lists (whitelists) and limitation lists (blacklists). Sharding becomes a constrained version of the database.
Usage Constraints
Data Balance
– Distribute data evenly, e.g., use user‑id modulo instead of province.
Avoid Deep Pagination
– Pagination without a sharding key can cause full‑table scans and OOM.
Minimize Sub‑queries
– Sub‑queries may break parsing; reduce them.
Transaction Minimization
– Keep transactions within a single shard to avoid cross‑shard operations.
Special Functions
– Functions like distinct, having, union, in, or are often unsupported or risky.
Product Recommendations
Focus on MyCat and ShardingJDBC. Many other middleware solutions exist but are harder to maintain.
Process Solution
Regardless of the entry layer, the sharding workflow includes:
Information Collection
Identify affected business domains and projects; determine the scope of tables to shard.
Team Involvement
Include developers familiar with the existing codebase to assess SQL impact.
Sharding Strategy
Define sharding dimensions and keys early; once chosen, they should not change.
Pre‑Preparation
Data Normalization
– Align table structures and key types.
SQL Scan
– Extract all SQL statements and evaluate compatibility with the sharding key.
Tool Validation
– Build tools to simulate routing and verify results.
Technical Preparation
– Prototype key features before full adoption.
Implementation Phase
Data Migration
– Use CDC (e.g., Canal) or message queues for incremental sync.
Extensive Testing
– Unit and integration tests must cover every SQL path; log routing decisions for review.
SQL Review
– Conduct a thorough audit of all statements.
Dry‑Run
– Rehearse the migration in non‑production environments.
New SQL Guidelines
– Establish standards to prevent unsupported operations after sharding.
Final Note
Sharding is a strategic, often irreversible, technical decision that requires executive sponsorship, experienced architects, and disciplined coordination to avoid project failure.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Backend Technology
Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
