Databases 50 min read

Choosing the Right Database: MySQL, Redis, HBase, ClickHouse, MongoDB, Elasticsearch, Neo4j, Prometheus & Milvus Explained

Explore nine major database technologies—from traditional relational MySQL to NoSQL Redis, columnar HBase and ClickHouse, document-oriented MongoDB, search engine Elasticsearch, graph Neo4j, time‑series Prometheus, and vector Milvus—plus practical best‑practice guides, real‑world polyglot persistence scenarios, and recommended resources for mastering modern data storage.

JD Cloud Developers

Jul 17, 2024

Choosing the Right Database: MySQL, Redis, HBase, ClickHouse, MongoDB, Elasticsearch, Neo4j, Prometheus & Milvus Explained

Introduction

In the digital era data is a core asset for enterprises. Selecting the appropriate storage technology is critical for performance, scalability, and cost. This article reviews nine database styles, their strengths, weaknesses, best practices, and typical use cases.

DB‑Engines Ranking (June 2024)

The ranking reflects popularity and community adoption of each database.

Relational Database – MySQL

MySQL is an open‑source RDBMS known for ACID compliance, strong consistency, and a rich ecosystem. It excels in transactional workloads such as finance, HR, and inventory systems.

Advantages

Low cost – open source.

Easy to use with familiar SQL syntax.

Large community and tooling.

Disadvantages

Scaling horizontally can be complex.

Performance may degrade with massive concurrent workloads.

Best Practices

Normalize data models.

Regularly purge obsolete data.

Design appropriate indexes.

Monitor performance and tune slow queries.

Typical Scenarios

Web applications (LAMP stack).

SMBs needing reliable, cost‑effective storage.

Key‑Value Store – Redis

Redis is an in‑memory key‑value database offering sub‑millisecond latency, rich data structures, persistence options, and built‑in replication.

Advantages

Ultra‑fast read/write.

Supports strings, lists, sets, sorted sets, hashes, bitmaps, hyperloglog, geospatial indexes.

High‑availability via Sentinel and Cluster.

Disadvantages

Memory‑centric – high cost for large datasets.

Limited to simple queries; not suited for complex relational analysis.

Persistence can become a bottleneck under heavy load.

Best Practices

Manage memory with TTL and eviction policies.

Choose RDB or AOF based on durability needs.

Avoid long‑running commands (e.g., KEYS *).

Design keys to prevent hotspots.

Typical Scenarios

Caching layer for web services.

Session storage.

Leaderboards, counters, real‑time analytics.

Column‑Oriented Store – HBase

HBase is an open‑source, distributed column‑family store built on Hadoop, ideal for massive write‑heavy workloads and random access at PB scale.

Advantages

Linear horizontal scalability.

Fast random reads/writes.

Automatic failover via Hadoop ecosystem.

Column‑family model suits analytical queries.

Disadvantages

Operational complexity; steep learning curve.

Memory and I/O intensive.

No multi‑row ACID transactions.

Best Practices

Design row keys to avoid hotspots.

Group frequently accessed columns in the same family.

Use compression (Snappy, LZ4) to save space.

Monitor cluster health and tune region servers.

Typical Scenarios

Large‑scale data warehousing.

Real‑time analytics on TB‑PB datasets.

Write‑intensive applications.

Column‑Oriented Store – ClickHouse

ClickHouse is an open‑source OLAP database optimized for real‑time analytical queries on large columnar datasets.

Advantages

High‑performance columnar reads.

Excellent compression reduces storage costs.

Near‑real‑time data ingestion.

Horizontal scalability.

Vectorized query execution and multi‑core utilization.

Full SQL support.

Disadvantages

Write throughput can be limited under heavy concurrent loads.

Cluster management adds operational complexity.

Limited transactional guarantees.

Best Practices

Model tables for query patterns; keep schemas narrow.

Use appropriate indexes and avoid over‑indexing.

Batch insert data to improve write efficiency.

Monitor performance with built‑in tools.

Leverage sharding and replication for scalability and HA.

Typical Scenarios

Log analytics, BI dashboards, and ad‑hoc reporting.

Real‑time data analysis for finance, e‑commerce, and monitoring.

Document Store – MongoDB

MongoDB is a popular NoSQL document database that stores data as flexible BSON documents, allowing schema‑less development.

Advantages

Flexible document model; no predefined schema.

Horizontal scalability via sharding.

High performance for read/write heavy workloads.

Built‑in replication and automatic failover.

Disadvantages

Multi‑document transactions are newer and can impact performance.

Higher storage consumption compared to relational tables.

Memory‑intensive for hot data.

Complex cluster administration (sharding, replica sets).

Best Practices

Design documents to be as flat as possible.

Create indexes judiciously.

Plan sharding keys to ensure even data distribution.

Monitor with MongoDB Atlas or Ops Manager.

Typical Scenarios

Content management systems.

Mobile and web applications with evolving schemas.

IoT data ingestion.

Big data pipelines requiring flexible storage.

Search Engine – Elasticsearch

Elasticsearch is a distributed search and analytics engine built on Apache Lucene, often used as a document store for full‑text search.

Advantages

Fast, scalable full‑text and structured search.

Rich query DSL, aggregations, and near‑real‑time indexing.

Mature Elastic Stack ecosystem (Logstash, Kibana).

Disadvantages

Resource‑intensive; requires ample RAM and CPU.

Steeper learning curve for advanced features.

Cluster management can be complex at scale.

Best Practices

Index only fields needed for search to reduce storage and improve cache efficiency.

Use appropriate shard and replica counts.

Keep JVM tuned (G1GC) and monitor heap usage.

Secure clusters with X‑Pack or other plugins.

Integrate with Grafana for richer visualizations.

Typical Scenarios

Full‑text search for e‑commerce, documentation.

Log aggregation and analysis (ELK stack).

Geospatial queries and recommendation engines.

Graph Database – Neo4j

Neo4j stores data as nodes, relationships, and properties, enabling efficient traversal of highly connected data.

Advantages

Optimized for deep relationship queries.

Intuitive Cypher query language.

Robust ecosystem and language drivers.

ACID‑compliant transactions.

Disadvantages

Performance tuning can be intricate for very large graphs.

Learning curve for developers accustomed to SQL.

Higher memory consumption for graph processing.

Best Practices

Model the graph to reflect real‑world entities and relationships.

Create indexes on frequently queried node properties.

Use bulk import tools for large data loads.

Monitor query performance and adjust cache settings.

Typical Scenarios

Social networks, recommendation systems.

Fraud detection and knowledge graphs.

Network and IT operations dependency mapping.

Time‑Series Database – Prometheus

Prometheus is an open‑source monitoring system that scrapes metrics from targets and stores them as time‑series data.

Advantages

Multi‑dimensional data model with flexible labels.

Powerful PromQL query language.

Built‑in storage optimized for high‑resolution metrics.

Automatic service discovery for dynamic environments.

Rich alerting capabilities.

Disadvantages

Long‑term storage is limited; external solutions needed for historic data.

Deleting or modifying data is non‑trivial.

Basic UI; advanced visualization requires Grafana.

Best Practices

Avoid excessive label cardinality.

Design concise alerting rules.

Leverage service discovery to reduce manual config.

Integrate with Grafana for dashboards.

Plan retention policies and consider Thanos/Cortex for long‑term storage.

Typical Scenarios

Cloud‑native application monitoring (Kubernetes).

Infrastructure metrics (CPU, memory, network).

Application performance monitoring.

Business KPI tracking.

Vector Database – Milvus

Milvus is an open‑source vector database designed for AI, machine learning, and similarity search on high‑dimensional vectors.

Advantages

Millisecond‑level vector search using IVF, HNSW, and ANNOY indexes.

Rich client SDKs (Python, Java, Go, etc.).

Horizontal and vertical scalability with data replication.

High availability and fault tolerance.

Active open‑source community.

Disadvantages

Resource‑intensive CPU and memory for indexing.

Steeper learning curve for vector similarity concepts.

Relatively new ecosystem compared to mature databases.

Best Practices

Select index type based on dataset size and latency requirements.

Batch insert vectors to maximize throughput.

Normalize vectors before ingestion.

Monitor system metrics and tune cache/CPU allocation.

Typical Scenarios

Image and video similarity search.

Recommendation engines.

Natural language processing embeddings.

Bioinformatics (protein/sequence similarity).

Polyglot Persistence

Modern systems often combine multiple databases to leverage each technology’s strengths. A common pattern is MySQL + Redis for transactional core and caching. For massive data and complex queries, architectures evolve to HBase + Elasticsearch, or HBase + Redis + Elasticsearch, balancing storage cost, write‑throughput, and search capabilities.

Typical Multi‑Persistence Scenarios

MySQL + Redis for everyday web workloads.

HBase + Elasticsearch for petabyte‑scale storage with rich search.

HBase + Redis + Elasticsearch for high‑concurrency reads, complex queries, and cache‑off‑loading.

Case Study – Logistics Order Center

Initially a MySQL + Redis stack handled <10 k daily orders. As volume grew to >10 M, the architecture shifted to HBase for cheap massive storage, Elasticsearch for complex order searches, and Redis plus a message queue for peak‑shaving and asynchronous processing, dramatically reducing cost and improving latency.

Conclusion

Understanding the characteristics of each database type enables architects to build systems that are performant, scalable, and cost‑effective. Whether the workload demands strong ACID guarantees, ultra‑fast key‑value access, massive analytical queries, graph traversals, time‑series monitoring, or vector similarity search, the right combination—often a polyglot persistence strategy—delivers the best results.

References & Further Reading

Official documentation: MySQL, Redis, HBase, ClickHouse, MongoDB, Elasticsearch, Neo4j, Prometheus, Milvus.

Books: "High Performance MySQL", "Redis in Action", "HBase in Action", "ClickHouse 原理解析与应用实践", "MongoDB 权威指南", "Elasticsearch 权威指南", "Graph Databases", "Prometheus: Up & Running", "Vector Databases Unleashed".

Online articles: "交易日均千万订单的存储架构设计与实践" (Jingdong Logistics).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Elasticsearch Redis ClickHouse prometheus mysql HBase Databases MongoDB

Written by

JD Cloud Developers

JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Introduction

DB‑Engines Ranking (June 2024)

Relational Database – MySQL

Advantages

Disadvantages

Best Practices

Typical Scenarios

Key‑Value Store – Redis

Advantages

Disadvantages

Best Practices

Typical Scenarios

Column‑Oriented Store – HBase

Advantages

Disadvantages

Best Practices

Typical Scenarios

Column‑Oriented Store – ClickHouse

Advantages

Disadvantages

Best Practices

Typical Scenarios

Document Store – MongoDB

Advantages

Disadvantages

Best Practices

Typical Scenarios

Search Engine – Elasticsearch

Advantages

Disadvantages

Best Practices

Typical Scenarios

Graph Database – Neo4j

Advantages

Disadvantages

Best Practices

Typical Scenarios

Time‑Series Database – Prometheus

Advantages

Disadvantages

Best Practices

Typical Scenarios

Vector Database – Milvus

Advantages

Disadvantages

Best Practices

Typical Scenarios

Polyglot Persistence

Typical Multi‑Persistence Scenarios

Case Study – Logistics Order Center

Conclusion

References & Further Reading

JD Cloud Developers

How this landed with the community

Was this worth your time?

0 Comments

DB‑Engines Ranking (June 2024)