Fundamentals 25 min read

9 Underrated Python Libraries That Can Boost Your Development Speed Tenfold

The article introduces nine lesser‑known Python packages—msgspec, glom, watchfiles, beartype, pyinstrument, duckdb, fakeredis, boltons, and returns—explaining how each tackles common prototyping pain points such as data serialization, file watching, runtime type safety, performance profiling, SQL querying without a server, mock Redis, utility functions, and explicit error handling, ultimately accelerating development by up to ten times.

Data STUDIO
Data STUDIO
Data STUDIO
9 Underrated Python Libraries That Can Boost Your Development Speed Tenfold

1. Data and Serialization: Escape the "glue code" hell

When building a prototype the first hurdle is handling data boundaries—JSON, YAML, or nested dictionaries. The msgspec library provides schema‑based serialization/deserialization that is 10‑50× faster than the standard json module and offers strict type safety.

Zero‑schema validation : define a structure once and reuse it safely.

Extreme performance : feel the speed when processing large volumes of data.

Types as documentation : the code itself becomes the best spec.

import msgspec
import json
import time

# 1. Define a user structure
class User(msgspec.Struct):
    id: int
    name: str
    email: str | None = None
    active: bool = True

# 2. Decode JSON bytes to a strong‑typed object
raw_json = b'{"id": 123, "name": "小明", "email": "[email protected]"}'
user = msgspec.json.decode(raw_json, type=User)
print(user)  # User(id=123, name='小明', email='[email protected]', active=True)

# 3. Encode back to JSON
encoded = msgspec.json.encode(user)
print(encoded.decode('utf-8'))  # {"id":123,"name":"小明","email":"[email protected]","active":true}

# 4. Simple performance comparison
data = [{"id": i, "name": f"user_{i}"} for i in range(10000)]
json_bytes = json.dumps(data).encode()

start = time.time()
for _ in range(1000):
    msgspec.json.decode(json_bytes, type=list[User])
msgspec_time = time.time() - start

start = time.time()
for _ in range(1000):
    json.loads(json_bytes)
json_time = time.time() - start

print(f"[Performance] Deserialize 10,000 items 1,000 times:")
print(f"  msgspec: {msgspec_time:.2f} s")
print(f"  json:    {json_time:.2f} s")
print(f"  msgspec is about {json_time/msgspec_time:.1f}× faster")

2. glom: Declarative data navigation

Deeply nested dictionary access like data['a']['b'][0]['c'] often leads to KeyError. glom lets you describe the path declaratively, making the intent clear and handling missing keys gracefully.

Clear intent : code reads like “what data I want”.

Fault‑tolerant : missing nodes can be handled with Coalesce.

Powerful transformation : you can reshape data while extracting it.

from glom import glom, Coalesce, PathAccessError

# Simulated complex API response
api_response = {
    "status": "success",
    "data": {
        "users": [
            {"profile": {"name": "Alice", "age": 30, "id": 1}},
            {"profile": {"name": "Bob", "id": 2}},  # Bob lacks age
            {"profile": {"name": "Charlie", "age": 25, "id": 3}},
        ],
        "metadata": {"page": 1, "total": 3},
    },
}

# 1. Extract first username (simple path)
usernames = glom(api_response, 'data.users.0.profile.name')
print(f"First user name: {usernames}")  # Alice

# 2. Extract all usernames (iterative path)
all_names = glom(api_response, ('data.users', ['profile.name']))
print(f"All usernames: {all_names}")  # ['Alice', 'Bob', 'Charlie']

# 3. Safe extraction with default for missing field
ages = glom(api_response, (
    'data.users',
    [Coalesce('profile.age', default=-1)]
))
print(f"Ages with default -1: {ages}")  # [30, -1, 25]

# 4. Complex restructuring
summary = glom(api_response, {
    'page': 'data.metadata.page',
    'user_count': ('data.users', len),
    'user_list': ('data.users', [{
        'id': 'profile.id',
        'name': 'profile.name',
    }])
})
print("
Restructured summary:")
print(summary)

2. Development and Debugging: Keep the prototype alive

3. watchfiles: Hot file watching

When developing web apps or data scripts you often have to stop and restart manually after a file change. watchfiles offers cross‑platform, high‑performance file‑system event listening with automatic reload.

Incredibly simple : a few lines set up watching.

Strong performance : built on Rust, far faster than pure‑Python loops.

Broad applicability : suitable for servers, build tools, data‑pipeline monitors, etc.

from watchfiles import watch
import time
import shutil

print("Monitoring .txt files in the current directory…")
for changes in watch('./', watch_filter=lambda change, path: path.endswith('.txt')):
    for change_type, file_path in changes:
        action = {1: 'Created', 2: 'Modified', 3: 'Deleted'}.get(change_type, 'Unknown')
        print(f"[{time.strftime('%H:%M:%S')}] File {action}: {file_path}")
    print("--- Monitoring continues... (Ctrl+C to exit)---")

4. beartype: Runtime type guardian

Python’s dynamic typing gives flexibility but can cause TypeError at runtime. beartype provides a near‑zero‑overhead decorator that enforces type hints during execution, catching bugs early without a heavy test suite.

Early bug capture : errors surface at the point of failure.

Confidence boost : you can trust function interfaces.

No performance loss : checks are efficient enough for production.

from beartype import beartype
from typing import List, Dict

@beartype
def calculate_stats(scores: List[float], weight: float) -> Dict[str, float]:
    """Compute average and weighted total."""
    if not scores:
        return {"average": 0.0, "weighted_total": 0.0}
    average = sum(scores) / len(scores)
    weighted_total = average * weight
    return {"average": average, "weighted_total": weighted_total}

print(calculate_stats([85.5, 90.0, 78.5], 1.1))

# Trigger a type error
try:
    calculate_stats([85, "90", 78.5], 1.1)
except Exception as e:
    print(f"
Caught type error: {type(e).__name__}: {e}")

5. pyinstrument: One‑second performance culprit

When a prototype runs slowly, pinpointing the bottleneck is hard. pyinstrument produces a clear, tree‑structured performance report that highlights the most time‑consuming functions, far more readable than raw cProfile output.

Out‑of‑the‑box : no complex configuration needed.

Intuitive results : visual tree shows call hierarchy and time percentages.

Low overhead : minimal impact on program speed.

from pyinstrument import Profiler
import time, random

def slow_function():
    """Simulated slow function with many useless loops."""
    time.sleep(0.1)  # I/O wait
    data = [random.random() for _ in range(50000)]
    sorted_data = sorted(data)  # potential bottleneck
    return sum(sorted_data[::1000])

def fast_function():
    """Simulated fast function."""
    time.sleep(0.01)
    return 42

def main_workflow():
    total = 0
    for i in range(5):
        if i % 2 == 0:
            total += slow_function()
        else:
            total += fast_function()
    return total

profiler = Profiler()
profiler.start()
result = main_workflow()
print(f"Result: {result}")
profiler.stop()
print("
" + "="*50)
print("Pyinstrument Performance Report")
print("="*50)
print(profiler.output_text(unicode=True, color=True))

3. Data Processing and Simulation: Magic to skip cumbersome setup

6. duckdb: SQL engine without a database

If you want powerful SQL over CSV, Parquet, or Pandas DataFrames without installing PostgreSQL or MySQL, duckdb offers an in‑process OLAP engine that runs queries directly on files at blazing speed.

Zero infrastructure : no server installation or management.

Rich syntax : supports standard SQL and advanced analytic functions.

Seamless integration : works hand‑in‑hand with Pandas, CSV, etc.

import duckdb
import pandas as pd

# 1. Query a Pandas DataFrame directly
df = pd.DataFrame({
    'country': ['中国', '美国', '中国', '英国', '美国', '中国'],
    'sales': [100, 150, 200, 80, 120, 300]
})
print("Original data:")
print(df)

print("
DuckDB query (sales by country):")
result_df = duckdb.sql('''
    SELECT country,
           SUM(sales) AS total_sales,
           AVG(sales) AS avg_sales,
           COUNT(*) AS order_count
    FROM df
    GROUP BY country
    ORDER BY total_sales DESC
''').df()
print(result_df)

# 2. Direct query on a CSV file (requires an actual file)
# print("
Query directly from CSV:")
# result = duckdb.sql("SELECT country, SUM(sales) FROM 'sales.csv' GROUP BY country").df()
# print(result)

7. fakeredis: Need Redis? No, just its behavior

When a prototype uses Redis for caching or queues but you cannot install a server locally, fakeredis provides a pure‑Python mock that fully mimics the redis-py client API.

Zero‑dependency deployment : scripts run anywhere without a real Redis server.

Perfect for testing : behavior is consistent and repeatable.

Smooth migration : swap the connection string after validation to use a real Redis instance.

import fakeredis
import json, time

# Create a mock Redis client – no server needed
cache = fakeredis.FakeRedis()

def get_user_profile(user_id: int):
    """Fetch user profile with caching."""
    cache_key = f"user_profile:{user_id}"
    cached = cache.get(cache_key)
    if cached:
        print(f"Cache hit for user {user_id}")
        return json.loads(cached)
    print(f"Cache miss, querying DB for user {user_id}")
    time.sleep(0.5)  # simulate slow DB query
    user_data = {"id": user_id, "name": f"User{user_id}", "score": user_id * 10}
    cache.setex(cache_key, 5, json.dumps(user_data))
    print(f"Cached data for user {user_id}")
    return user_data

print("First request (hits DB):")
print(get_user_profile(1))

print("
Second request (should hit cache):")
print(get_user_profile(1))

print("
After 6 seconds (cache expired, hits DB again):")
time.sleep(6)
print(get_user_profile(1))

# Demonstrate list operations
list_key = "my_list"
cache.lpush(list_key, "task1", "task2")
print(f"
Mock Redis list contents: {cache.lrange(list_key, 0, -1)}")

4. Engineering Enhancements: Put useful wheels on the car

8. boltons: Your "Swiss‑army‑knife" toolbox

boltons

bundles over 200 practical functions and classes that fill gaps left by the standard library, ranging from iteration helpers to advanced dictionaries and atomic file saving.

Reduce reinvented wheels : functions like chunked, flatten, defaults are ready to use.

More robust code : utilities are battle‑tested in real projects.

Improved readability : expressive names make intent clear.

from boltons.iterutils import chunked, flatten, unique
from boltons.dictutils import OMD, FrozenDict
from boltons.fileutils import atomic_save
import json

# 1. Chunked processing
big_list = list(range(20))
print("Chunked:", list(chunked(big_list, 6)))

# 2. Flatten & deduplicate
nested = [[1, 2], [3, [4, 5]], 6]
flat = list(flatten(nested))
print("Flattened:", flat)
print("Unique:", list(unique([1, 2, 2, 3, 3, 3])))

# 3. Ordered multi‑value dict (OMD)
omd = OMD()
omd.add('language', 'Python')
omd.add('language', 'Java')
omd.add('language', 'Go')
print("
Ordered multi‑value dict:", list(omd.items()))
print(omd['language'])   # last value
print(omd.getlist('language'))  # all values

# 4. Atomic file save
data = {"project": "prototype", "status": "awesome"}
try:
    with atomic_save('config.json', text_mode=True) as f:
        json.dump(data, f, indent=2)
    print("
File atomically saved to 'config.json'")
except Exception as e:
    print(f"Save failed, original file untouched: {e}")

9. returns: Make error handling part of the design

Typical try...except blocks mix success and failure logic. The returns library introduces functional containers like Result and Maybe, forcing explicit handling of success and failure paths.

Explicit over implicit : function signatures reveal possible failures.

Encourages composition : utilities such as bind and map let you safely chain operations.

Reduces bugs : you cannot accidentally ignore error cases.

from returns.result import Result, Success, Failure
from returns.pipeline import flow
from returns.pointfree import bind

# Traditional approach – exceptions hidden deep inside
def parse_divide_1(a_str: str, b_str: str) -> float:
    a = int(a_str)
    b = int(b_str)
    return a / b

# Using returns – explicit success/failure
def parse_int(value: str) -> Result[int, str]:
    """Parse a string into an int, returning Success or Failure."""
    try:
        return Success(int(value))
    except ValueError:
        return Failure(f"Cannot parse '{value}' as int")

def safe_divide(a: int, b: int) -> Result[float, str]:
    """Safe division returning a Result container."""
    if b == 0:
        return Failure("Division by zero")
    return Success(a / b)

# Compose parsing and division with flow + bind
def parse_divide_2(a_str: str, b_str: str) -> Result[float, str]:
    return flow(
        parse_int(a_str),
        bind(lambda a: parse_int(b_str).bind(lambda b: safe_divide(a, b)))
    )

# Test cases
test_cases = [("10", "2"), ("ten", "2"), ("10", "0"), ("10", "2.5")]
for a, b in test_cases:
    print(f"
Computing {a} / {b}:")
    result = parse_divide_2(a, b)
    result.match(
        on_success=lambda v: print(f"  Success: result = {v}"),
        on_failure=lambda e: print(f"  Failure: reason = {e}")
    )

Conclusion

The core contradiction of prototype development is the need to validate ideas quickly while spending excessive time on boilerplate, infrastructure, and error handling. These nine libraries— msgspec, glom, watchfiles, beartype, pyinstrument, duckdb, fakeredis, boltons, and returns —address each pain point from data boundaries to engineering hygiene, turning tedious chores into a few lines of code and dramatically accelerating development.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonSQLserializationfunctional programmingProfilingutilitiesfile-watchingruntime-type-checkingmock-redis
Data STUDIO
Written by

Data STUDIO

Click to receive the "Python Study Handbook"; reply "benefit" in the chat to get it. Data STUDIO focuses on original data science articles, centered on Python, covering machine learning, data analysis, visualization, MySQL and other practical knowledge and project case studies.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.