Choosing the Right Database: MySQL, Redis, HBase, ClickHouse, MongoDB, Elasticsearch, Neo4j, Prometheus & Milvus Explained
Explore nine major database technologies—from traditional relational MySQL to NoSQL Redis, columnar HBase and ClickHouse, document-oriented MongoDB, search engine Elasticsearch, graph Neo4j, time‑series Prometheus, and vector Milvus—plus practical best‑practice guides, real‑world polyglot persistence scenarios, and recommended resources for mastering modern data storage.
Introduction
In the digital era data is a core asset for enterprises. Selecting the appropriate storage technology is critical for performance, scalability, and cost. This article reviews nine database styles, their strengths, weaknesses, best practices, and typical use cases.
DB‑Engines Ranking (June 2024)
The ranking reflects popularity and community adoption of each database.
Relational Database – MySQL
MySQL is an open‑source RDBMS known for ACID compliance, strong consistency, and a rich ecosystem. It excels in transactional workloads such as finance, HR, and inventory systems.
Advantages
Low cost – open source.
Easy to use with familiar SQL syntax.
Large community and tooling.
Disadvantages
Scaling horizontally can be complex.
Performance may degrade with massive concurrent workloads.
Best Practices
Normalize data models.
Regularly purge obsolete data.
Design appropriate indexes.
Monitor performance and tune slow queries.
Typical Scenarios
Web applications (LAMP stack).
SMBs needing reliable, cost‑effective storage.
Key‑Value Store – Redis
Redis is an in‑memory key‑value database offering sub‑millisecond latency, rich data structures, persistence options, and built‑in replication.
Advantages
Ultra‑fast read/write.
Supports strings, lists, sets, sorted sets, hashes, bitmaps, hyperloglog, geospatial indexes.
High‑availability via Sentinel and Cluster.
Disadvantages
Memory‑centric – high cost for large datasets.
Limited to simple queries; not suited for complex relational analysis.
Persistence can become a bottleneck under heavy load.
Best Practices
Manage memory with TTL and eviction policies.
Choose RDB or AOF based on durability needs.
Avoid long‑running commands (e.g., KEYS *).
Design keys to prevent hotspots.
Typical Scenarios
Caching layer for web services.
Session storage.
Leaderboards, counters, real‑time analytics.
Column‑Oriented Store – HBase
HBase is an open‑source, distributed column‑family store built on Hadoop, ideal for massive write‑heavy workloads and random access at PB scale.
Advantages
Linear horizontal scalability.
Fast random reads/writes.
Automatic failover via Hadoop ecosystem.
Column‑family model suits analytical queries.
Disadvantages
Operational complexity; steep learning curve.
Memory and I/O intensive.
No multi‑row ACID transactions.
Best Practices
Design row keys to avoid hotspots.
Group frequently accessed columns in the same family.
Use compression (Snappy, LZ4) to save space.
Monitor cluster health and tune region servers.
Typical Scenarios
Large‑scale data warehousing.
Real‑time analytics on TB‑PB datasets.
Write‑intensive applications.
Column‑Oriented Store – ClickHouse
ClickHouse is an open‑source OLAP database optimized for real‑time analytical queries on large columnar datasets.
Advantages
High‑performance columnar reads.
Excellent compression reduces storage costs.
Near‑real‑time data ingestion.
Horizontal scalability.
Vectorized query execution and multi‑core utilization.
Full SQL support.
Disadvantages
Write throughput can be limited under heavy concurrent loads.
Cluster management adds operational complexity.
Limited transactional guarantees.
Best Practices
Model tables for query patterns; keep schemas narrow.
Use appropriate indexes and avoid over‑indexing.
Batch insert data to improve write efficiency.
Monitor performance with built‑in tools.
Leverage sharding and replication for scalability and HA.
Typical Scenarios
Log analytics, BI dashboards, and ad‑hoc reporting.
Real‑time data analysis for finance, e‑commerce, and monitoring.
Document Store – MongoDB
MongoDB is a popular NoSQL document database that stores data as flexible BSON documents, allowing schema‑less development.
Advantages
Flexible document model; no predefined schema.
Horizontal scalability via sharding.
High performance for read/write heavy workloads.
Built‑in replication and automatic failover.
Disadvantages
Multi‑document transactions are newer and can impact performance.
Higher storage consumption compared to relational tables.
Memory‑intensive for hot data.
Complex cluster administration (sharding, replica sets).
Best Practices
Design documents to be as flat as possible.
Create indexes judiciously.
Plan sharding keys to ensure even data distribution.
Monitor with MongoDB Atlas or Ops Manager.
Typical Scenarios
Content management systems.
Mobile and web applications with evolving schemas.
IoT data ingestion.
Big data pipelines requiring flexible storage.
Search Engine – Elasticsearch
Elasticsearch is a distributed search and analytics engine built on Apache Lucene, often used as a document store for full‑text search.
Advantages
Fast, scalable full‑text and structured search.
Rich query DSL, aggregations, and near‑real‑time indexing.
Mature Elastic Stack ecosystem (Logstash, Kibana).
Disadvantages
Resource‑intensive; requires ample RAM and CPU.
Steeper learning curve for advanced features.
Cluster management can be complex at scale.
Best Practices
Index only fields needed for search to reduce storage and improve cache efficiency.
Use appropriate shard and replica counts.
Keep JVM tuned (G1GC) and monitor heap usage.
Secure clusters with X‑Pack or other plugins.
Integrate with Grafana for richer visualizations.
Typical Scenarios
Full‑text search for e‑commerce, documentation.
Log aggregation and analysis (ELK stack).
Geospatial queries and recommendation engines.
Graph Database – Neo4j
Neo4j stores data as nodes, relationships, and properties, enabling efficient traversal of highly connected data.
Advantages
Optimized for deep relationship queries.
Intuitive Cypher query language.
Robust ecosystem and language drivers.
ACID‑compliant transactions.
Disadvantages
Performance tuning can be intricate for very large graphs.
Learning curve for developers accustomed to SQL.
Higher memory consumption for graph processing.
Best Practices
Model the graph to reflect real‑world entities and relationships.
Create indexes on frequently queried node properties.
Use bulk import tools for large data loads.
Monitor query performance and adjust cache settings.
Typical Scenarios
Social networks, recommendation systems.
Fraud detection and knowledge graphs.
Network and IT operations dependency mapping.
Time‑Series Database – Prometheus
Prometheus is an open‑source monitoring system that scrapes metrics from targets and stores them as time‑series data.
Advantages
Multi‑dimensional data model with flexible labels.
Powerful PromQL query language.
Built‑in storage optimized for high‑resolution metrics.
Automatic service discovery for dynamic environments.
Rich alerting capabilities.
Disadvantages
Long‑term storage is limited; external solutions needed for historic data.
Deleting or modifying data is non‑trivial.
Basic UI; advanced visualization requires Grafana.
Best Practices
Avoid excessive label cardinality.
Design concise alerting rules.
Leverage service discovery to reduce manual config.
Integrate with Grafana for dashboards.
Plan retention policies and consider Thanos/Cortex for long‑term storage.
Typical Scenarios
Cloud‑native application monitoring (Kubernetes).
Infrastructure metrics (CPU, memory, network).
Application performance monitoring.
Business KPI tracking.
Vector Database – Milvus
Milvus is an open‑source vector database designed for AI, machine learning, and similarity search on high‑dimensional vectors.
Advantages
Millisecond‑level vector search using IVF, HNSW, and ANNOY indexes.
Rich client SDKs (Python, Java, Go, etc.).
Horizontal and vertical scalability with data replication.
High availability and fault tolerance.
Active open‑source community.
Disadvantages
Resource‑intensive CPU and memory for indexing.
Steeper learning curve for vector similarity concepts.
Relatively new ecosystem compared to mature databases.
Best Practices
Select index type based on dataset size and latency requirements.
Batch insert vectors to maximize throughput.
Normalize vectors before ingestion.
Monitor system metrics and tune cache/CPU allocation.
Typical Scenarios
Image and video similarity search.
Recommendation engines.
Natural language processing embeddings.
Bioinformatics (protein/sequence similarity).
Polyglot Persistence
Modern systems often combine multiple databases to leverage each technology’s strengths. A common pattern is MySQL + Redis for transactional core and caching. For massive data and complex queries, architectures evolve to HBase + Elasticsearch, or HBase + Redis + Elasticsearch, balancing storage cost, write‑throughput, and search capabilities.
Typical Multi‑Persistence Scenarios
MySQL + Redis for everyday web workloads.
HBase + Elasticsearch for petabyte‑scale storage with rich search.
HBase + Redis + Elasticsearch for high‑concurrency reads, complex queries, and cache‑off‑loading.
Case Study – Logistics Order Center
Initially a MySQL + Redis stack handled <10 k daily orders. As volume grew to >10 M, the architecture shifted to HBase for cheap massive storage, Elasticsearch for complex order searches, and Redis plus a message queue for peak‑shaving and asynchronous processing, dramatically reducing cost and improving latency.
Conclusion
Understanding the characteristics of each database type enables architects to build systems that are performant, scalable, and cost‑effective. Whether the workload demands strong ACID guarantees, ultra‑fast key‑value access, massive analytical queries, graph traversals, time‑series monitoring, or vector similarity search, the right combination—often a polyglot persistence strategy—delivers the best results.
References & Further Reading
Official documentation: MySQL, Redis, HBase, ClickHouse, MongoDB, Elasticsearch, Neo4j, Prometheus, Milvus.
Books: "High Performance MySQL", "Redis in Action", "HBase in Action", "ClickHouse 原理解析与应用实践", "MongoDB 权威指南", "Elasticsearch 权威指南", "Graph Databases", "Prometheus: Up & Running", "Vector Databases Unleashed".
Online articles: "交易日均千万订单的存储架构设计与实践" (Jingdong Logistics).
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
JD Cloud Developers
JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
