Databases 7 min read

Master OpenGemini: From Schema Design to Performance Tuning Best Practices

This article summarizes a live session where Shawn explains how understanding business scenarios drives effective OpenGemini database design, and provides comprehensive best‑practice guidance on library and table design, data ingestion, query optimization, and performance tuning for time‑series workloads.

Huawei Cloud Developer Alliance

Jun 3, 2024

Master OpenGemini: From Schema Design to Performance Tuning Best Practices

In a live session titled “From Database Design to Performance Tuning, Master OpenGemini Application Development Best Practices,” Huawei Cloud DTSE evangelist and OpenGemini community founder Shawn emphasized that grasping business scenarios is the key to sound database design and then introduced best‑practice guidance for OpenGemini library and table design, data writing, and query execution.

Evaluating business scenarios involves eight detailed factors: data classification, application classification, collection frequency (seconds), timeline assessment, per‑minute write volume, metrics to collect, query scenarios, and data retention period.

Library and table design : OpenGemini uses the concept of shards; each shard has its own index and cache. Adding databases (DB) or retention policies (RP) creates additional shards, increasing concurrency. Shawn recommends using multiple databases to fully utilize system resources. The number of shards is limited by memory (approximately ≤ total_memory × 0.25 / 60 MB) and by local disk I/O bandwidth.

Performance impact of excessive shards or tables :

More DB/RP → more shards → higher memory consumption and greater disk I/O contention.

More tables → more data files → more OS file handles.

More shards/tables → larger metadata → longer synchronization latency between ts‑sql, ts‑store, and ts‑meta, causing read/write performance fluctuations.

Table design principles :

Design tables based on query scenarios.

If a table exceeds 1,000 metric columns, consider splitting it.

Data ingestion best practices :

Batch writes on the client side to reduce network round‑trips.

Concurrent writes must keep timelines non‑overlapping to avoid out‑of‑order data.

Determine an optimal BatchSize through experimentation.

Increase the number of concurrent SQL statements to boost throughput.

When write concurrency is high, reduce BatchSize to prevent store‑side data backlog.

Kernel parameters for write performance : Monitor QPS stability; large fluctuations may stem from WAL latency, disk I/O bottlenecks, cache buildup, or compaction blocking.

Query best practices :

Avoid full‑timeline scans without TAG filters.

Be cautious with massive grouping (TAG+Time or fine‑grained Time).

Heavy aggregations on large datasets (except first/last/count/sum/mean/min/max) can cause OOM.

Large timeline queries with tags matching millions of series may also trigger OOM.

Query tips :

For large result sets, add chunked=true&chunk_size=1000 to stream results.

Use the Hint syntax /*+ full_series */ when the query includes all TAGs to direct the request to the specific data node.

In nested queries, filter early using TAGs or time ranges to minimize intermediate result size.

The session concluded with an invitation to join the OpenGemini community, try the open‑source time‑series database, and contribute to its development.

sharding Query Optimization Performance Tuning Time Series Database data ingestion openGemini

Written by

Huawei Cloud Developer Alliance

The Huawei Cloud Developer Alliance creates a tech sharing platform for developers and partners, gathering Huawei Cloud product knowledge, event updates, expert talks, and more. Together we continuously innovate to build the cloud foundation of an intelligent world.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.