Artificial Intelligence 17 min read

How to Quickly Analyze and Predict Stock Prices with Python in 12 Minutes

This tutorial shows how to fetch historical stock data from Yahoo Finance using pandas, compute moving averages and returns, explore correlations among major tech stocks, engineer features, train linear, polynomial, and K‑Nearest‑Neighbour models with scikit‑learn, evaluate their accuracy, and visualize both historical prices and future forecasts, all in a concise, step‑by‑step guide.

Python Crawling & Data Mining

Jun 24, 2020

How to Quickly Analyze and Predict Stock Prices with Python in 12 Minutes

Using Python to Quickly Analyze, Visualize, and Predict Stock Prices

A friend suggested that stock investing is key to financial freedom, prompting many amateur traders to seek simple analysis methods. The main questions are which stocks to choose, how to analyze them, and how to assess risk versus return. This guide demonstrates a fast, Python‑based workflow that can be followed in about 12 minutes.

2 Load Yahoo Finance Dataset

Pandas DataReader provides easy access to financial data sources such as Yahoo Finance. The following code extracts Apple (AAPL) adjusted close prices from January 1, 2010 to January 1, 2017.

import pandas as pd
import datetime
import pandas_datareader.data as web

start = datetime.datetime(2010, 1, 1)
end = datetime.datetime(2017, 1, 11)

df = web.DataReader("AAPL", "yahoo", start, end)
print(df.tail())

3 Explore Moving Average and Returns

3.1 Moving Average

Rolling (moving) averages smooth price fluctuations and help identify trends. The code below computes a 100‑day moving average of the adjusted close price and plots it alongside the raw series.

close_px = df['Adj Close']
mavg = close_px.rolling(window=100).mean()

3.2 Returns

Daily returns are calculated as the percentage change between consecutive closing prices.

rets = close_px / close_px.shift(1) - 1
rets.plot(label='return')

4 Analyze Competitor Stocks

Data for Apple, GE, Google, IBM, and Microsoft are retrieved together for comparative analysis.

dfcomp = web.DataReader(["AAPL","GE","GOOG","IBM","MSFT"], "yahoo", start, end)["Adj Close"]

4.1 Correlation Analysis

Percentage changes are computed, and a correlation matrix is derived. Scatter plots (e.g., Apple vs. GE) and a scatter‑matrix with KDE on the diagonal visualize relationships. A heat‑map displays the correlation strengths.

retscomp = dfcomp.pct_change()
corr = retscomp.corr()
plt.scatter(retscomp.AAPL, retscomp.GE)
plt.imshow(corr, cmap='hot', interpolation='none')
plt.colorbar()

4.2 Returns and Risk

Mean returns and standard deviations (risk) are plotted for each stock, with annotations identifying each ticker.

plt.scatter(retscomp.mean(), retscomp.std())
for label, x, y in zip(retscomp.columns, retscomp.mean(), retscomp.std()):
    plt.annotate(label, xy=(x, y), xytext=(20, -20),
                 textcoords='offset points', ha='right', va='bottom',
                 bbox=dict(boxstyle='round,pad=0.5', fc='yellow', alpha=0.5),
                 arrowprops=dict(arrowstyle='->', connectionstyle='arc3,rad=0'))

5 Predict Stock Price

5.1 Feature Engineering

Two additional features are created: high‑low percentage (HL_PCT) and percent change from open to close (PCT_change).

dfreg = df.loc[:,['Adj Close','Volume']]
dfreg['HL_PCT'] = (df['High'] - df['Low']) / df['Close'] * 100.0
dfreg['PCT_change'] = (df['Close'] - df['Open']) / df['Open'] * 100.0

5.2 Preprocessing and Cross‑Validation

Missing values are filled, a label column (future adjusted close) is created, features are scaled, and the data are split into training and testing sets.

# Drop missing values
dfreg.fillna(value=-99999, inplace=True)

forecast_out = int(math.ceil(0.01 * len(dfreg)))
forecast_col = 'Adj Close'
dfreg['label'] = dfreg[forecast_col].shift(-forecast_out)

X = np.array(dfreg.drop(['label'], 1))
X = preprocessing.scale(X)
X_lately = X[-forecast_out:]
X = X[:-forecast_out]

y = np.array(dfreg['label'])
y = y[:-forecast_out]

5.3 Model Generation

Linear regression, polynomial (ridge) regression of degree 2 and 3, and K‑Nearest‑Neighbour regression models are instantiated.

from sklearn.linear_model import LinearRegression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.linear_model import Ridge
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline

5.4 Linear and Quadratic Regression

Linear regression fits a straight line; quadratic regression (degree 2 and 3) fits polynomial curves.

clfreg = LinearRegression(n_jobs=-1)
clfreg.fit(X_train, y_train)

clfpoly2 = make_pipeline(PolynomialFeatures(2), Ridge())
clfpoly2.fit(X_train, y_train)

clfpoly3 = make_pipeline(PolynomialFeatures(3), Ridge())
clfpoly3.fit(X_train, y_train)

5.5 K‑Nearest‑Neighbour

KNN predicts a value based on the nearest neighbours in feature space.

clfknn = KNeighborsRegressor(n_neighbors=2)
clfknn.fit(X_train, y_train)

5.6 Evaluation

Each model’s score (R²) on the test set is printed; all exceed 0.92, with linear and quadratic models above 0.96.

confidence_reg = clfreg.score(X_test, y_test)
confidence_poly2 = clfpoly2.score(X_test, y_test)
confidence_poly3 = clfpoly3.score(X_test, y_test)
confidence_knn = clfknn.score(X_test, y_test)
print('Linear regression confidence:', confidence_reg)
print('Quadratic regression 2 confidence:', confidence_poly2)
print('Quadratic regression 3 confidence:', confidence_poly3)
print('KNN regression confidence:', confidence_knn)

6 Plot Predictions

The forecasted values are appended to the dataframe and plotted together with the historical adjusted close prices.

last_date = dfreg.iloc[-1].name
last_unix = last_date
next_unix = last_unix + datetime.timedelta(days=1)
for i in forecast_set:
    next_date = next_unix
    next_unix += datetime.timedelta(days=1)
    dfreg.loc[next_date] = [np.nan] * (len(dfreg.columns)-1) + [i]

dfreg['Adj Close'].tail(500).plot()
dfreg['Forecast'].tail(500).plot()
plt.legend(loc=4)
plt.xlabel('Date')
plt.ylabel('Price')
plt.show()

7 Future Improvements / Challenges

Incorporate qualitative factors such as news sentiment analysis.

Analyze quantitative macro‑economic indicators (e.g., HPI, income inequality) alongside stock data.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python Scikit-learn stock analysis

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.