How to Quickly Analyze and Predict Stock Prices with Python in 12 Minutes
This tutorial shows how to fetch historical stock data from Yahoo Finance using pandas, compute moving averages and returns, explore correlations among major tech stocks, engineer features, train linear, polynomial, and K‑Nearest‑Neighbour models with scikit‑learn, evaluate their accuracy, and visualize both historical prices and future forecasts, all in a concise, step‑by‑step guide.
Using Python to Quickly Analyze, Visualize, and Predict Stock Prices
A friend suggested that stock investing is key to financial freedom, prompting many amateur traders to seek simple analysis methods. The main questions are which stocks to choose, how to analyze them, and how to assess risk versus return. This guide demonstrates a fast, Python‑based workflow that can be followed in about 12 minutes.
2 Load Yahoo Finance Dataset
Pandas DataReader provides easy access to financial data sources such as Yahoo Finance. The following code extracts Apple (AAPL) adjusted close prices from January 1, 2010 to January 1, 2017.
import pandas as pd
import datetime
import pandas_datareader.data as web
start = datetime.datetime(2010, 1, 1)
end = datetime.datetime(2017, 1, 11)
df = web.DataReader("AAPL", "yahoo", start, end)
print(df.tail())3 Explore Moving Average and Returns
3.1 Moving Average
Rolling (moving) averages smooth price fluctuations and help identify trends. The code below computes a 100‑day moving average of the adjusted close price and plots it alongside the raw series.
close_px = df['Adj Close']
mavg = close_px.rolling(window=100).mean()3.2 Returns
Daily returns are calculated as the percentage change between consecutive closing prices.
rets = close_px / close_px.shift(1) - 1
rets.plot(label='return')4 Analyze Competitor Stocks
Data for Apple, GE, Google, IBM, and Microsoft are retrieved together for comparative analysis.
dfcomp = web.DataReader(["AAPL","GE","GOOG","IBM","MSFT"], "yahoo", start, end)["Adj Close"]4.1 Correlation Analysis
Percentage changes are computed, and a correlation matrix is derived. Scatter plots (e.g., Apple vs. GE) and a scatter‑matrix with KDE on the diagonal visualize relationships. A heat‑map displays the correlation strengths.
retscomp = dfcomp.pct_change()
corr = retscomp.corr()
plt.scatter(retscomp.AAPL, retscomp.GE)
plt.imshow(corr, cmap='hot', interpolation='none')
plt.colorbar()4.2 Returns and Risk
Mean returns and standard deviations (risk) are plotted for each stock, with annotations identifying each ticker.
plt.scatter(retscomp.mean(), retscomp.std())
for label, x, y in zip(retscomp.columns, retscomp.mean(), retscomp.std()):
plt.annotate(label, xy=(x, y), xytext=(20, -20),
textcoords='offset points', ha='right', va='bottom',
bbox=dict(boxstyle='round,pad=0.5', fc='yellow', alpha=0.5),
arrowprops=dict(arrowstyle='->', connectionstyle='arc3,rad=0'))5 Predict Stock Price
5.1 Feature Engineering
Two additional features are created: high‑low percentage (HL_PCT) and percent change from open to close (PCT_change).
dfreg = df.loc[:,['Adj Close','Volume']]
dfreg['HL_PCT'] = (df['High'] - df['Low']) / df['Close'] * 100.0
dfreg['PCT_change'] = (df['Close'] - df['Open']) / df['Open'] * 100.05.2 Preprocessing and Cross‑Validation
Missing values are filled, a label column (future adjusted close) is created, features are scaled, and the data are split into training and testing sets.
# Drop missing values
dfreg.fillna(value=-99999, inplace=True)
forecast_out = int(math.ceil(0.01 * len(dfreg)))
forecast_col = 'Adj Close'
dfreg['label'] = dfreg[forecast_col].shift(-forecast_out)
X = np.array(dfreg.drop(['label'], 1))
X = preprocessing.scale(X)
X_lately = X[-forecast_out:]
X = X[:-forecast_out]
y = np.array(dfreg['label'])
y = y[:-forecast_out]5.3 Model Generation
Linear regression, polynomial (ridge) regression of degree 2 and 3, and K‑Nearest‑Neighbour regression models are instantiated.
from sklearn.linear_model import LinearRegression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.linear_model import Ridge
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline5.4 Linear and Quadratic Regression
Linear regression fits a straight line; quadratic regression (degree 2 and 3) fits polynomial curves.
clfreg = LinearRegression(n_jobs=-1)
clfreg.fit(X_train, y_train)
clfpoly2 = make_pipeline(PolynomialFeatures(2), Ridge())
clfpoly2.fit(X_train, y_train)
clfpoly3 = make_pipeline(PolynomialFeatures(3), Ridge())
clfpoly3.fit(X_train, y_train)5.5 K‑Nearest‑Neighbour
KNN predicts a value based on the nearest neighbours in feature space.
clfknn = KNeighborsRegressor(n_neighbors=2)
clfknn.fit(X_train, y_train)5.6 Evaluation
Each model’s score (R²) on the test set is printed; all exceed 0.92, with linear and quadratic models above 0.96.
confidence_reg = clfreg.score(X_test, y_test)
confidence_poly2 = clfpoly2.score(X_test, y_test)
confidence_poly3 = clfpoly3.score(X_test, y_test)
confidence_knn = clfknn.score(X_test, y_test)
print('Linear regression confidence:', confidence_reg)
print('Quadratic regression 2 confidence:', confidence_poly2)
print('Quadratic regression 3 confidence:', confidence_poly3)
print('KNN regression confidence:', confidence_knn)6 Plot Predictions
The forecasted values are appended to the dataframe and plotted together with the historical adjusted close prices.
last_date = dfreg.iloc[-1].name
last_unix = last_date
next_unix = last_unix + datetime.timedelta(days=1)
for i in forecast_set:
next_date = next_unix
next_unix += datetime.timedelta(days=1)
dfreg.loc[next_date] = [np.nan] * (len(dfreg.columns)-1) + [i]
dfreg['Adj Close'].tail(500).plot()
dfreg['Forecast'].tail(500).plot()
plt.legend(loc=4)
plt.xlabel('Date')
plt.ylabel('Price')
plt.show()7 Future Improvements / Challenges
Incorporate qualitative factors such as news sentiment analysis.
Analyze quantitative macro‑economic indicators (e.g., HPI, income inequality) alongside stock data.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
