Build a Quantitative Analysis Database with Python and PostgreSQL: Step‑by‑Step Guide
This tutorial walks through installing PostgreSQL, connecting it to Python via psycopg2 and SQLAlchemy, fetching millions of stock records with Tushare, storing and updating the data in a PostgreSQL database, and visualizing query results for quantitative finance strategies, all with complete code examples.
Data is the foundation of quantitative finance; this article demonstrates how to create a quant‑analysis database using Python and PostgreSQL.
Introduction
Financial quant analysis relies on large datasets such as historical stock trades, fundamentals, macro and industry data. Open‑source databases like MySQL, PostgreSQL, MongoDB and SQLite are widely used; PostgreSQL is highlighted for its popularity and suitability.
Installing PostgreSQL
Download the appropriate installer from the official PostgreSQL website, set a password (e.g., "123456"), and accept default options. After installation, pgAdmin4 is available as a web‑based graphical tool for managing the database.
Install the Python libraries needed for database interaction:
pip install psycopg2 sqlalchemyFetching Stock Data
Use the Tushare API to obtain daily market data for thousands of stocks. The data is retrieved into pandas DataFrames, which can then be written to PostgreSQL.
Code Examples
# import libraries
import tushare as ts
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pylab import mpl
mpl.rcParams['font.sans-serif'] = ['SimHei']
mpl.rcParams['axes.unicode_minus'] = False
# set Tushare token
token = 'YOUR_TOKEN'
pro = ts.pro_api(token)
# function to get data for a stock code
def get_data(code, start='20190101', end='20190425'):
df = ts.pro_bar(ts_code=code, adj='qfq', start_date=start, end_date=end)
return df
# function to get all stock codes
def get_code():
codes = pro.stock_basic(list_status='L').ts_code.values
return codesData Insertion and Update Functions
from sqlalchemy import create_engine
import psycopg2
engine = create_engine('postgresql+psycopg2://postgres:123456@localhost:5432/postgres')
def insert_sql(data, db_name, if_exists='append'):
try:
data.to_sql(db_name, engine, index=False, if_exists=if_exists)
except:
pass
def update_sql(start, end, db_name):
for code in get_code():
data = get_data(code, start, end)
insert_sql(data, db_name)
print(f"{start}:{end} data updated successfully")Example Application
Download data for the period 2019‑01‑01 to 2019‑04‑25 and store it in the table stock_data:
for code in get_code():
data = get_data(code)
insert_sql(data, 'stock_data')Read the entire table back into pandas:
df = pd.read_sql('stock_data', engine)
print(len(df)) # 270998 rowsQuery and Visualization
Define a helper to plot distribution of a condition using pyecharts:
def plot_data(condition, title):
from pyecharts import Bar
data = pd.read_sql("select * from stock_data where " + condition, engine)
count = data.groupby('trade_date')['ts_code'].count()
bar = Bar(title, title_text_size=15)
bar.add('', count.index, count.values, is_splitline_show=False, linewidth=2)
return barExamples:
c1 = "close<2"
t1 = "Stocks with price below 2"
plot_data(c1, t1)c2 = "pct_chg>9.5"
t2 = "Daily rise > 9.5%"
plot_data(c2, t2)Stock Selection Strategy
Filter stocks based on listing date, ST status, positive PE, market cap, etc., then apply a 20‑day moving‑average strategy:
import talib as ta
def find_stock(date):
selected = []
for code in get_new_code(date):
try:
df = df_all_data.loc[df_all_data.ts_code == code].copy()
df.index = pd.to_datetime(df.trade_date)
df = df.sort_index()
df['ma_20'] = ta.MA(df.close, timeperiod=20)
if df.iloc[-1]['close'] > df.iloc[-1]['ma_20']:
selected.append(code)
except:
pass
return selectedInsert the selected codes into a new table:
fs = find_stock('20190305')
if fs:
df_find = pd.DataFrame(fs, columns=['ts_code'])
insert_sql(df_find, 'find_stocks', if_exists='replace')Further Analysis
Retrieve data for the chosen stocks, compute daily returns, and plot cumulative net value:
select_data = pd.DataFrame()
for code in codes:
try:
df_ = df_all_data[df_all_data.ts_code == code]
df_.index = pd.to_datetime(df_.trade_date)
df_ = df_.sort_index()
select_data[code] = df_.close
except:
pass
select_data.fillna(method='ffill', inplace=True)
ret = select_data.pct_change().dropna()
prod_ret = (1 + ret).cumprod()
prod_ret.plot(figsize=(12,5))
plt.title('Portfolio Cumulative Net Value')
plt.show()Conclusion
The article provides a practical introduction to using Python with PostgreSQL for building a quantitative analysis database, covering installation, data acquisition, storage, querying, visualization, and a simple 20‑day moving‑average stock‑selection strategy. While the data volume in the example is modest, the same principles scale to much larger datasets, making database skills essential for serious quant work.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
