Databases 12 min read

Build a Quantitative Analysis Database with Python and PostgreSQL: Step‑by‑Step Guide

This tutorial walks through installing PostgreSQL, connecting it to Python via psycopg2 and SQLAlchemy, fetching millions of stock records with Tushare, storing and updating the data in a PostgreSQL database, and visualizing query results for quantitative finance strategies, all with complete code examples.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
Build a Quantitative Analysis Database with Python and PostgreSQL: Step‑by‑Step Guide

Data is the foundation of quantitative finance; this article demonstrates how to create a quant‑analysis database using Python and PostgreSQL.

Introduction

Financial quant analysis relies on large datasets such as historical stock trades, fundamentals, macro and industry data. Open‑source databases like MySQL, PostgreSQL, MongoDB and SQLite are widely used; PostgreSQL is highlighted for its popularity and suitability.

Installing PostgreSQL

Download the appropriate installer from the official PostgreSQL website, set a password (e.g., "123456"), and accept default options. After installation, pgAdmin4 is available as a web‑based graphical tool for managing the database.

Install the Python libraries needed for database interaction:

pip install psycopg2 sqlalchemy

Fetching Stock Data

Use the Tushare API to obtain daily market data for thousands of stocks. The data is retrieved into pandas DataFrames, which can then be written to PostgreSQL.

Code Examples

# import libraries
import tushare as ts
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pylab import mpl
mpl.rcParams['font.sans-serif'] = ['SimHei']
mpl.rcParams['axes.unicode_minus'] = False

# set Tushare token
token = 'YOUR_TOKEN'
pro = ts.pro_api(token)

# function to get data for a stock code
def get_data(code, start='20190101', end='20190425'):
    df = ts.pro_bar(ts_code=code, adj='qfq', start_date=start, end_date=end)
    return df

# function to get all stock codes
def get_code():
    codes = pro.stock_basic(list_status='L').ts_code.values
    return codes

Data Insertion and Update Functions

from sqlalchemy import create_engine
import psycopg2
engine = create_engine('postgresql+psycopg2://postgres:123456@localhost:5432/postgres')

def insert_sql(data, db_name, if_exists='append'):
    try:
        data.to_sql(db_name, engine, index=False, if_exists=if_exists)
    except:
        pass

def update_sql(start, end, db_name):
    for code in get_code():
        data = get_data(code, start, end)
        insert_sql(data, db_name)
    print(f"{start}:{end} data updated successfully")

Example Application

Download data for the period 2019‑01‑01 to 2019‑04‑25 and store it in the table stock_data:

for code in get_code():
    data = get_data(code)
    insert_sql(data, 'stock_data')

Read the entire table back into pandas:

df = pd.read_sql('stock_data', engine)
print(len(df))  # 270998 rows

Query and Visualization

Define a helper to plot distribution of a condition using pyecharts:

def plot_data(condition, title):
    from pyecharts import Bar
    data = pd.read_sql("select * from stock_data where " + condition, engine)
    count = data.groupby('trade_date')['ts_code'].count()
    bar = Bar(title, title_text_size=15)
    bar.add('', count.index, count.values, is_splitline_show=False, linewidth=2)
    return bar

Examples:

c1 = "close<2"
t1 = "Stocks with price below 2"
plot_data(c1, t1)
Price below 2 chart
Price below 2 chart
c2 = "pct_chg>9.5"
t2 = "Daily rise > 9.5%"
plot_data(c2, t2)
Rise >9.5% chart
Rise >9.5% chart

Stock Selection Strategy

Filter stocks based on listing date, ST status, positive PE, market cap, etc., then apply a 20‑day moving‑average strategy:

import talib as ta

def find_stock(date):
    selected = []
    for code in get_new_code(date):
        try:
            df = df_all_data.loc[df_all_data.ts_code == code].copy()
            df.index = pd.to_datetime(df.trade_date)
            df = df.sort_index()
            df['ma_20'] = ta.MA(df.close, timeperiod=20)
            if df.iloc[-1]['close'] > df.iloc[-1]['ma_20']:
                selected.append(code)
        except:
            pass
    return selected

Insert the selected codes into a new table:

fs = find_stock('20190305')
if fs:
    df_find = pd.DataFrame(fs, columns=['ts_code'])
    insert_sql(df_find, 'find_stocks', if_exists='replace')

Further Analysis

Retrieve data for the chosen stocks, compute daily returns, and plot cumulative net value:

select_data = pd.DataFrame()
for code in codes:
    try:
        df_ = df_all_data[df_all_data.ts_code == code]
        df_.index = pd.to_datetime(df_.trade_date)
        df_ = df_.sort_index()
        select_data[code] = df_.close
    except:
        pass
select_data.fillna(method='ffill', inplace=True)
ret = select_data.pct_change().dropna()
prod_ret = (1 + ret).cumprod()
prod_ret.plot(figsize=(12,5))
plt.title('Portfolio Cumulative Net Value')
plt.show()
Cumulative net value chart
Cumulative net value chart

Conclusion

The article provides a practical introduction to using Python with PostgreSQL for building a quantitative analysis database, covering installation, data acquisition, storage, querying, visualization, and a simple 20‑day moving‑average stock‑selection strategy. While the data volume in the example is modest, the same principles scale to much larger datasets, making database skills essential for serious quant work.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

data engineeringPythonPostgreSQLSQLAlchemyquantitative analysisTushare
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.