Databases 23 min read

One‑Stop Python Database Guide: From MySQL & PostgreSQL to MongoDB, Redis, Neo4j and Vector Stores

This article walks through the strengths, typical use cases, and Python connection code for relational databases (MySQL, PostgreSQL, SQLite), NoSQL stores (MongoDB, Redis, Neo4j), cloud‑native options (DynamoDB), and emerging vector databases such as Milvus, helping you decide which fits your project.

Data STUDIO
Data STUDIO
Data STUDIO
One‑Stop Python Database Guide: From MySQL & PostgreSQL to MongoDB, Redis, Neo4j and Vector Stores

Why a Database Selection Guide?

When starting a new Python project developers often struggle to choose the right database. This guide categorises the most common choices, explains their trade‑offs, and provides minimal, runnable Python snippets for each.

1. Relational Databases – Structured Data Foundations

1.1 MySQL

Typical scenarios : e‑commerce sites, CMS, traditional enterprise apps that need strong ACID guarantees and a mature ecosystem.

Python can connect via two popular drivers:

# Method 1: mysqlclient (requires native client libraries)
# pip install mysqlclient
import MySQLdb
conn = MySQLdb.connect(host='localhost', user='root', password='your_password', database='test_db', charset='utf8mb4')

# Method 2: PyMySQL (pure Python)
# pip install pymysql
import pymysql
conn = pymysql.connect(host='localhost', user='root', password='your_password', database='test_db', charset='utf8mb4')

cursor = conn.cursor()
cursor.execute("SELECT * FROM users WHERE id = %s", (1,))
result = cursor.fetchone()
print(f"Query result: {result}")
cursor.close()
conn.close()

Best practice : Django projects use MySQL out of the box; in production add a connection pool.

1.2 PostgreSQL

Typical scenarios : complex business logic, geographic data, JSON storage, and applications that need advanced features such as window functions and full‑text search.

# psycopg2 (synchronous)
# pip install psycopg2-binary
import psycopg2
conn = psycopg2.connect(host='localhost', database='test_db', user='postgres', password='your_password')
cursor = conn.cursor()
cursor.execute("SELECT id, data->>'name' as name FROM products WHERE data->>'category' = 'electronics'")
for row in cursor.fetchall():
    print(row)
conn.close()

# asyncpg (asynchronous, higher performance)
# pip install asyncpg
import asyncio, asyncpg
async def query_pg():
    conn = await asyncpg.connect(host='localhost', user='postgres', password='your_password', database='test_db')
    rows = await conn.fetch('SELECT * FROM users')
    await conn.close()
    return rows

1.3 SQLite

Typical scenarios : desktop tools, mobile prototypes, unit tests, or any single‑process application where a full server is unnecessary.

# Built‑in sqlite3 module
import sqlite3
conn = sqlite3.connect('my_database.db')
conn.execute('''
    CREATE TABLE IF NOT EXISTS users (
        id INTEGER PRIMARY KEY AUTOINCREMENT,
        name TEXT NOT NULL,
        email TEXT UNIQUE NOT NULL
    )
''')
conn.execute("INSERT INTO users (name, email) VALUES (?, ?)", ('张三', '[email protected]'))
conn.commit()
for row in conn.execute('SELECT * FROM users'):
    print(f"ID: {row[0]}, Name: {row[1]}, Email: {row[2]}")
conn.close()

2. NoSQL Databases – Flexible Schemas and High‑Speed Operations

2.1 MongoDB

Typical scenarios : user profiles, content management, log storage, and fast‑iteration products where the schema evolves frequently.

# PyMongo driver
# pip install pymongo
from pymongo import MongoClient
client = MongoClient('mongodb://localhost:27017/')
db = client['blog_database']
posts = db['posts']
post_data = {
    "title": "Python数据库选型指南",
    "author": "Py-Core",
    "tags": ["Python", "数据库", "NoSQL"],
    "content": "这是一篇关于数据库选型的文章...",
    "created_at": datetime.now(),
    "views": 0,
    "metadata": {"word_count": 1500, "read_time": "5分钟"}
}
result = posts.insert_one(post_data)
print(f"Inserted document ID: {result.inserted_id}")
for post in posts.find({"tags": "Python"}):
    print(f"Title: {post['title']}")

2.2 Redis

Typical scenarios : caching, session storage, leaderboards, simple message queues, and rate limiting.

# redis‑py driver
# pip install redis
import redis, json
r = redis.Redis(host='localhost', port=6379, db=0)
# String example
r.set('user:1001:name', '张三')
print(f"Username: {r.get('user:1001:name')}")
# Hash example (store an object)
user_data = {'name': '李四', 'age': 25, 'email': '[email protected]'}
r.hset('user:1002', mapping=user_data)
print(f"User info: {r.hgetall('user:1002')}")
# List as a simple queue
r.lpush('task_queue', '任务1')
r.lpush('task_queue', '任务2')
print(f"Next task: {r.rpop('task_queue')}")
# Set for deduplication
r.sadd('user:1001:follows', 'user:1002', 'user:1003')
common = r.sinter('user:1001:follows', 'user:1002:follows')
print(f"Common follows: {common}")
# Expire a key after 1 hour
r.setex('session:abc123', 3600, '用户会话数据')

2.3 Neo4j (Graph Database)

Typical scenarios : social networks, recommendation engines, fraud detection, knowledge graphs where relationships are first‑class citizens.

# Official Neo4j driver
# pip install neo4j
from neo4j import GraphDatabase
class Neo4jExample:
    def __init__(self, uri, user, password):
        self.driver = GraphDatabase.driver(uri, auth=(user, password))
    def close(self):
        self.driver.close()
    def create_friendship(self, p1, p2):
        with self.driver.session() as session:
            result = session.write_transaction(self._create_and_return_friendship, p1, p2)
            print(f"Created relationship: {result}")
    @staticmethod
    def _create_and_return_friendship(tx, p1, p2):
        query = """
            CREATE (a:Person {name: $p1})
            CREATE (b:Person {name: $p2})
            CREATE (a)-[:FRIEND_OF]->(b)
            RETURN a, b
        """
        return [
            {"a": record["a"]["name"], "b": record["b"]["name"]}
            for record in tx.run(query, p1=p1, p2=p2)
        ]
    def find_friends_of_friends(self, name):
        with self.driver.session() as session:
            result = session.read_transaction(self._find_friends_of_friends, name)
            print(f"{name}的朋友的朋友: {result}")
    @staticmethod
    def _find_friends_of_friends(tx, name):
        query = """
            MATCH (p:Person {name: $name})-[:FRIEND_OF*2]->(fof:Person)
            WHERE p <> fof
            RETURN DISTINCT fof.name AS friend_of_friend
        """
        return [record["friend_of_friend"] for record in tx.run(query, name=name)]

neo = Neo4jExample('bolt://localhost:7687', 'neo4j', 'password')
neo.create_friendship('Alice', 'Bob')
neo.create_friendship('Bob', 'Charlie')
neo.find_friends_of_friends('Alice')
neo.close()

3. Cloud‑Native Databases – Managed, Serverless Options

3.1 DynamoDB (AWS)

Typical scenarios : serverless applications on AWS, workloads that need virtually unlimited scaling with pay‑per‑request pricing.

# boto3 driver
# pip install boto3
import boto3
from boto3.dynamodb.conditions import Key

dynamodb = boto3.resource(
    'dynamodb',
    region_name='us-east-1',
    aws_access_key_id='YOUR_KEY',
    aws_secret_access_key='YOUR_SECRET'
)
# Create a table
table = dynamodb.create_table(
    TableName='Users',
    KeySchema=[
        {'AttributeName': 'user_id', 'KeyType': 'HASH'},
        {'AttributeName': 'timestamp', 'KeyType': 'RANGE'}
    ],
    AttributeDefinitions=[
        {'AttributeName': 'user_id', 'AttributeType': 'S'},
        {'AttributeName': 'timestamp', 'AttributeType': 'N'}
    ],
    BillingMode='PAY_PER_REQUEST'
)
# Insert an item
table.put_item(Item={
    'user_id': 'user_001',
    'timestamp': 1633046400,
    'name': '张三',
    'email': '[email protected]',
    'preferences': {'theme': 'dark', 'notifications': True}
})
# Query by primary key
response = table.query(KeyConditionExpression=Key('user_id').eq('user_001'))
print(f"Query result: {response['Items']}")

4. Emerging Vector Databases – Semantic Search for AI Applications

Vector stores turn unstructured data (text, images, audio) into high‑dimensional embeddings using models such as BERT or CLIP. They excel at similarity search, RAG, and multimodal retrieval.

4.1 Milvus Example (Open‑Source)

# Install the client (includes Milvus Lite)
# pip install -U pymilvus
from pymilvus import MilvusClient, model
client = MilvusClient('milvus_demo.db')
# Create a collection with 768‑dimensional vectors
client.create_collection(collection_name='article_collection', dimension=768)
# Load a default embedding function (for demo purposes)
embedding_fn = model.DefaultEmbeddingFunction()
knowledge_base = [
    "Python是一种流行的高级编程语言,以简洁易读著称。",
    "向量数据库专门用于处理由深度学习模型生成的高维向量。",
    "大语言模型如GPT-4在理解和生成自然语言方面能力突出。"
]
vectors = embedding_fn.encode_documents(knowledge_base)
# Prepare data records
data_to_insert = [
    {"id": 0, "vector": vectors[0], "text": knowledge_base[0]},
    {"id": 1, "vector": vectors[1], "text": knowledge_base[1]},
    {"id": 2, "vector": vectors[2], "text": knowledge_base[2]}
]
res = client.insert(collection_name='article_collection', data=data_to_insert)
print(f"Inserted {res['insert_count']} records.")
# Perform a similarity search
user_question = "有什么专门处理AI模型数据的数据库?"
question_vector = embedding_fn.encode_queries([user_question])
search_results = client.search(
    collection_name='article_collection',
    data=question_vector,
    limit=2,
    output_fields=['text']
)
for hits in search_results:
    for hit in hits:
        print(f"Score: {hit['distance']:.4f}, Text: {hit['entity']['text']}")

Other notable options include Chroma, Qdrant, Pinecone (managed), and the PostgreSQL extension pgvector. The article also notes a market trend: major RDBMS vendors now embed vector‑search capabilities, turning vector search into a generic “ability” rather than a separate product.

5. Decision‑Making Guidance

A Python function illustrates a simple decision tree based on project requirements. The function returns a sensible default (PostgreSQL) when no specific need matches.

def choose_database(requirements):
    """Database selection decision function"""
    if requirements["need_acid"] and requirements["structured_data"]:
        if requirements["small_project"]:
            return "SQLite"
        elif requirements["need_advanced_features"]:
            return "PostgreSQL"
        else:
            return "MySQL/MariaDB"
    elif requirements["flexible_schema"]:
        if requirements["need_native_json"]:
            return "MongoDB"
        elif requirements["offline_sync"]:
            return "CouchDB"
    elif requirements["caching_needed"]:
        return "Redis"
    elif requirements["graph_relationships"]:
        return "Neo4j"
    elif requirements["high_write_throughput"]:
        return "Cassandra"
    elif requirements["serverless_aws"]:
        return "DynamoDB"
    elif requirements["vector_search"]:
        return "LanceDB or a dedicated vector DB"
    else:
        return "PostgreSQL"  # safe default

my_project = {
    "need_acid": True,
    "structured_data": True,
    "small_project": False,
    "need_advanced_features": True,
    "flexible_schema": False,
    "caching_needed": True,
    "graph_relationships": False,
}
print(f"Recommended database: {choose_database(my_project)}")

For large‑scale e‑commerce systems a hybrid architecture is common:

"""
Typical e‑commerce stack:
1. PostgreSQL – core business data (users, orders, products)
2. Redis – shopping‑cart, session cache, hot data
3. MongoDB – user‑behavior logs, product reviews
4. Neo4j – recommendation graph
"""

A concrete login workflow demonstrates how to orchestrate Redis, PostgreSQL, and MongoDB together:

def user_login(user_id, password):
    # 1. Check Redis cache for an existing session
    cached = redis_client.get(f"session:{user_id}")
    if cached:
        return json.loads(cached)
    # 2. Verify credentials in PostgreSQL
    user = postgres_query(
        "SELECT * FROM users WHERE id = %s AND password_hash = %s",
        (user_id, hash_password(password))
    )
    if not user:
        return None
    # 3. Log the login event to MongoDB
    mongo_collection.insert_one({
        "user_id": user_id,
        "action": "login",
        "timestamp": datetime.now(),
        "ip": get_client_ip()
    })
    # 4. Populate Redis session (1‑hour TTL)
    session_data = {"user_id": user_id, "name": user["name"], "permissions": user["permissions"]}
    redis_client.setex(f"session:{user_id}", 3600, json.dumps(session_data))
    # 5. Optionally fetch personalized recommendations from Neo4j
    recommendations = neo4j_get_recommendations(user_id)
    return {**session_data, "recommendations": recommendations}

6. Quick Reference Matrix

Small / prototype : SQLite – zero‑config, file‑based.

Typical web applications : PostgreSQL – rich feature set, strong consistency.

Rapid iteration / flexible schema : MongoDB – schema‑less JSON documents.

High‑throughput caching : Redis – in‑memory speed.

Complex relationship analysis : Neo4j – graph‑centric queries.

AWS‑centric stack : DynamoDB (+ Aurora if relational features needed).

7. Next Steps

Beginners : start with PostgreSQL + Redis; they cover >90 % of use‑cases.

Intermediate : add MongoDB or Neo4j based on specific data‑model needs.

Production : implement backup, monitoring, and performance tuning; consider vector‑search capabilities when building AI‑enhanced features.

By following the comparative analysis, code examples, and decision guidance in this guide, you can select a database that aligns with your project's functional and operational requirements.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythondatabaseRedisMySQLvector-searchPostgreSQLMongoDB
Data STUDIO
Written by

Data STUDIO

Click to receive the "Python Study Handbook"; reply "benefit" in the chat to get it. Data STUDIO focuses on original data science articles, centered on Python, covering machine learning, data analysis, visualization, MySQL and other practical knowledge and project case studies.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.