Stock Market Prediction using Polynomial regression – Part II
In the first article we have seen the scary stock market predictions done by linear regression model. The linear regression predicted that the stock market will not grow in next ten years. Year on year returns from the stock market will be near zero in next ten years. These somewhat non digestible predictions came because we tried to fit the stock market in a first degree polynomial equation i.e. a straight line. As we know the growth of a stock market is never linear like a line, hence, we should not use first degree linear equations here. We need higher order polynomial equations.
In statistics, polynomial regression is a form of regression analysis in which the relationship between the independent variable x and the dependent variable y is modelled as an nth degree polynomial in x. Polynomial regression fits a nonlinear relationship between the value of x and the corresponding conditional mean of y, denoted E(y |x), and has been used to describe nonlinear phenomena such as the growth rate of tissues, the distribution of carbon isotopes in lake sediments, and the progression of disease epidemics.
Here, I have used a polynomial equation of degree five as a mathematical model. Let’s see, how does it perform and what are its predictions. The equation of degree five is like this, y = c0 + c1.x**1 + c2.x**2+….+ c5.x**5
The first thing we need is the historical data. We can download it from BSE India Archives . I have downloaded the data from year 1989 to the current date. The downloaded data sheet has total five columns; Date, Open, High, Low and Close. For simplicity we are deleting three columns Open, High and Low from the data sheet and keep only Date and Close columns.
Following is the complete python code. The code is commented enough to explain each and every aspect of this problem. Copy paste it on your python IDE and run it to see the predictions in graphical format.
#import packages import pandas as pd import numpy as np from sklearn.linear_model import LinearRegression #for polynomial regression from sklearn.preprocessing import PolynomialFeatures #to plot within notebook import matplotlib.pyplot as plt #function to calculate compound annual growth rate def CAGR(first, last, periods): return ((last/first)**(1/periods)-1) * 100 #Read the data file df = pd.read_csv('D:\\python3\\data\\SensexHistoricalData.csv') #Setting index as date df['Date'] = pd.to_datetime(df.Date) df.index = df['Date'] #Converting dates into number of days as dates cannot be passed directly #to any regression model df.index = (df.index - pd.to_datetime('1970-01-01')).days #Convert the pandas series into numpy array, we need to further #massage it before sending it to regression model y = np.asarray(df['Close']) x = np.asarray(df.index.values) #Model initialization #by default the degree of the equation is 1. #Hence the mathematical model equation is y = mx + c, #which is an equation of a line. regression_model = LinearRegression() #Choose the order of your polynomial. Here the degree is set to 5. #hence the mathematical model equation is #y = c0 + c1.x**1 + c2.x**2+....+ c5.x**5 poly = PolynomialFeatures(5) #Convert dimension x in the higher degree polynomial expression X_transform = poly.fit_transform(x.reshape(-1, 1)) #Fit the data(train the model) regression_model.fit(X_transform, y.reshape(-1, 1)) # Prediction for historical dates. Let's call it learned values. y_learned = regression_model.predict(X_transform) #Now, add future dates to the date index and pass that index to #the regression model for future prediction. #As we have converted date index into a range index, hence, here we #just need to add 3650 days ( roughly 10 yrs) #to the previous index. x[-1] gives the last value of the series. newindex = np.asarray(pd.RangeIndex(start=x[-1], stop=x[-1] + 3650)) #Convert the extended dimension x in the higher degree polynomial expression X_extended_transform = poly.fit_transform(newindex.reshape(-1, 1)) #Prediction for future dates. Let's call it predicted values. y_predict = regression_model.predict(X_extended_transform) #Print the last predicted value print ("Closing price at 2029 would be around ", y_predict[-1]) #Convert the days index back to dates index for plotting the graph x = pd.to_datetime(df.index, origin='1970-01-01', unit='D') future_x = pd.to_datetime(newindex, origin='1970-01-01', unit='D') #Print CAGR for next ten years. print ('Your investments will have a CAGR of ',(CAGR(y[-1], y_predict[-1], 10)), '%') #Setting figure size from matplotlib.pylab import rcParams rcParams['figure.figsize'] = 20,10 #Plot the actual data plt.figure(figsize=(16,8)) plt.plot(x,df['Close'], label='Close Price History') #Plot the regression model plt.plot(x,y_learned, color='r', label='Mathematical Model') #Plot the future predictions plt.plot(future_x,y_predict, color='g', label='Future Predictions') #Set the title of the graph plt.suptitle('Stock Market Predictions', fontsize=16) #Set the title of the graph window fig = plt.gcf() fig.canvas.set_window_title('Stock Market Predictions') #display the legends plt.legend() #display the graph plt.show()
What does this above image tell us?
X-axis of the graph shows the dates from year 1989 to year 2029 and the Y-axis shows the market closing price.
The graph in the blue color displays the close price history from year 1989 to year 2019. The graph in the red represents the best fit polynomial equation of degree five for the market data. The graph in the green is the future predictions.
What are the predictions ?
- Market price after ten years would be around 58312.
- In actual terms there will be 3.97% year on year growth in next ten years.
Prediction graph analysis:
Curves have always been smarter objects on this planet. Here also they are showing some degree of smartness. In this exercise we have moved little bit closer to the reality. If you look at the graph; you can see now the red line is turning with the market numbers. It is not as straight as it was with the linear regression. However, the overall picture predicted by fifth order polynomial equation is also gloomy, only 4% CAGR from the market in next ten years is still horrible.
Can we rely on these numbers? Are regression techniques capable to solve this problem or do we need someone better? Like a prophet to do time series prophecies. I mean the Prophet!! Yes, the Prophet is coming in my next article to do better predictions. Till then happy predicting…..