Databases 17 min read

Why MySQL Can Power JanusGraph: Design, Challenges, and Performance Insights

This article explores how MySQL can serve as a JanusGraph storage backend, detailing the design choices, multi‑tenant architecture, key‑column‑value model implementation, encountered issues such as connection timeouts and deadlocks, and performance testing results that demonstrate its suitability for medium‑scale Data Catalog services.

Volcano Engine Developer Services
Volcano Engine Developer Services
Volcano Engine Developer Services
Why MySQL Can Power JanusGraph: Design, Challenges, and Performance Insights

DataLeap is a one‑stop data middle‑platform suite that aggregates years of experience in data integration, development, operation, governance, asset management, and security, helping enterprise customers improve data development efficiency and reduce management costs.

Data Catalog, a metadata management service, collects technical metadata and enriches it with business context and semantics, supporting cataloging, search, and detailed browsing. It is a core component of the DataLeap product suite, serving nearly all internal business lines at ByteDance.

The storage layer of Data Catalog relies on Apache Atlas, which in turn depends on JanusGraph. JanusGraph’s storage backend is pluggable; this article discusses the design considerations and practical issues when using MySQL as the storage backend.

Background

In production, the existing storage system incurs high maintenance costs and operational pressure, prompting the search for alternatives.

MySQL offers lower operational costs in B2B scenarios, making it an attractive replacement candidate.

Consequently, a dedicated effort was made to research and implement a MySQL‑based storage backend.

Evaluation of Storage Options

High‑cost stateful clusters (e.g., HBase, Cassandra) were excluded.

Single‑node solutions were ruled out due to scalability concerns.

Solutions requiring extensive development effort (e.g., Redis) were also excluded.

MySQL was ultimately selected for further development.

Theoretical Feasibility of MySQL

Supports both Key‑Value (KV) and Key‑Column‑Value (KCV) models with B+‑tree clustered indexes, enabling range queries without in‑memory re‑sorting.

Existing systems already handle tables with billions of rows, and MySQL’s mature sharding solutions can accommodate such scale.

Write‑throughput requirements are modest because most data is ingested via offline tasks.

Overall Design

A meta table stores tenant‑to‑DataSource mappings and shard configurations.

StoreManager injects tenant information into StoreTransaction during openTransaction and returns the appropriate DataSource.

StoreManager maintains a map of Store objects (e.g., system_properties, tx_log, graphindex, edgestore) keyed by name, providing cross‑tenant capabilities.

All reads and writes go through Store, which obtains tenant info and a database connection from StoreTransaction.

For single‑tenant scenarios, data can be sharded; ShardManager determines the target shard based on a hash of the key.

Each Store’s table has four columns (id, g_key, g_column, g_value); the composite key+column forms a clustered index.

Tenant context is set before operations and cleared afterward.

Detailed Design and Issues

Storage Model

JanusGraph expects a column‑family storage (e.g., Cassandra, HBase) where each row consists of a key and multiple column‑value pairs. For non‑column‑family stores, two adaptation models exist:

KCV model: Stores key, column, and value separately; implemented via KeyColumnValueStoreManager.

KV model: Stores only key and value (key+column merged); implemented via KeyValueStoreManager with ordered variants and adapters for KCV.

MySQL implements the KCV model: each table has an auto‑increment ID plus three columns representing key, column, and value. The key+column pair forms a composite index, supporting efficient queries and column‑based sorting.

Multi‑Tenant Support

JanusGraph creates multiple tables per tenant (e.g., tenantA_edgestore). Each tenant has its own MySQL connection configuration, initialized at startup, and tenant information is passed via context for each request.

Implementation Classes

MysqlKcvTx

: Extends AbstractStoreTransaction, encapsulates the MySQL connection and implements commit/rollback. MysqlKcvStore: Implements KeyColumnValueStore, builds SQL statements using tenant info and delegates execution to MysqlKcvTx. MysqlKcvStoreManager: Implements KeyColumnValueStoreManager, manages connections and Store objects, creates transactions per tenant.

public class MysqlKcvStoreManager implements KeyColumnValueStoreManager { ... }

Transaction Handling

All JanusGraph interactions open a transaction. While JanusGraph’s transactions are thread‑safe, ACID guarantees depend on the underlying storage. MySQL transactions are used via MysqlKcvTx, which forwards commit and rollback to the JDBC connection.

Connection Pool

Both HikariCP and Druid were evaluated; Druid was chosen for its richer monitoring features despite Hikari’s performance claims.

Encountered Problems

Connection Timeout

During large table metadata processing, connections remained idle for extended periods, leading to server‑side timeout errors. The fix involved increasing wait_timeout on the MySQL server to 3600 s and adjusting the client’s minimum idle time to 2400 s.

Parallel Write Deadlocks

Concurrent threads inserting into default_edgestore caused MySQL deadlocks on the __modificationTimestamp column. The resolution was to modify Atlas code so that this system property is set only on creation, not on updates.

Performance Testing

Testing simulated Data Catalog service workloads on a six‑node cluster (8 CPU × 32 GB each). Scenarios included single‑tenant sharding, metadata creation, updates, and lineage queries. Results showed that with 100 k tables and billions of rows, response times remained within expected ranges, confirming MySQL’s suitability for medium‑scale Data Catalog services.

Conclusion

MySQL offers easy deployment, low operational overhead, and good scalability as a JanusGraph storage backend, meeting performance requirements for small‑to‑medium Data Catalog services. Future work may involve integrating mature MySQL sharding solutions to support larger scales.

MySQLMulti‑TenantJanusGraphData CatalogStorage Backend
Volcano Engine Developer Services
Written by

Volcano Engine Developer Services

The Volcano Engine Developer Community, Volcano Engine's TOD community, connects the platform with developers, offering cutting-edge tech content and diverse events, nurturing a vibrant developer culture, and co-building an open-source ecosystem.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.