Stock Market Prediction using Python – Part I
With the advent of high speed computers the python language has become an immensely powerful tool for performing complex mathematical operations. There is a strong development community support for python. These communities are enriching python with powerful packages for machine learning such as numpy, pandas, sklearn, and keras etc. Now, artificial intelligence and machine learning has become a piece of cake for computer developers. In this article we will see how python can be used for predicting stock market behavior.
We can predict the future of the systems which follow some kind of patterns. Such as real estate prices, economy boom and recession, and gold prices etc. These systems follow a cycle of ups and downs. We can build a mathematical model of their behavior by analyzing the historical data of those systems. This mathematical model then can be used to predict the future.
Let’s see these two examples to understand what can be predicated and what cannot be predicted. Suppose the prices of the gold were increased by x dollars in last month. This month, the price was increased by y dollars then what will be the increase in the next month ? This problem can be solved by simple linear regression technique of machine learning.
Now, consider this example of Ramesh and Suresh. Ramesh and Suresh are two good friends. Ramesh recently brought a dog named Bruno. When Suresh went to Ramesh’s house, Bruno barked at him twice “Bhoo Bhoo” :). Next time when Suresh went to Ramesh’s house again , Bruno barked at him thrice “Bhoo Bhoo Bhoo” :). Then , how many times Bruno will bark at Suresh if he goes to Ramesh’s house again? ( Two times? / Three times? ). No, linear regression cannot predict this, at least I am not aware of. In this case Bruno may not bark at all, because, now Bruno would have become familiar with Suresh. This is a very specific case which cannot be solved by current regression techniques. Regression techniques are used to solve generalized problems such as stock market prediction which we are going to solve now.
Let’s start with a simple predication using linear regression. I am writing this article on 02-July-2019. Today, the BSE sensex is at 39671 points. Where will the sensex be after 10 years from today ? How much year -on- year returns we will get from the sensex ? If we know the answer of these questions today then we can plan our investments efficiently.
The first thing we need is the historical data. We can download it from BSE-India Archives . I have downloaded the data from year 1989 to the current date. The downloaded data sheet has total five columns; Date, Open, High, Low and Close. For simplicity we are deleting three columns Open, High and Low from the data sheet and keep only Date and Close columns.
Here is the complete python code. The code is commented enough to explain each and every aspect of this problem. Copy paste it on your python IDE and see the predictions in graphical format.
#import packages import pandas as pd import numpy as np from sklearn.linear_model import LinearRegression #to plot within notebook import matplotlib.pyplot as plt #read the data file df = pd.read_csv('D:\\python3\\data\\SensexHistoricalData.csv') #setting index as date df['Date'] = pd.to_datetime(df.Date) df.index = df['Date'] #converting dates into number of days as dates cannot be passed directly to any regression model df.index = (df.index - pd.to_datetime('1970-01-01')).days # Convert the pandas series into numpy array, we need to further massage it before sending it to regression model y = np.asarray(df['Close']) x = np.asarray(df.index.values) # Model initialization # by default the degree of the equation is 1. # Hence the mathematical model equation is y = mx + c, which is an equation of a line. regression_model = LinearRegression() # Fit the data(train the model) regression_model.fit(x.reshape(-1, 1), y.reshape(-1, 1)) # Prediction for historical dates. Let's call it learned values. y_learned = regression_model.predict(x.reshape(-1, 1)) # Now, add future dates to the date index and pass that index to the regression model for future prediction. # As we have converted date index into a range index, hence, here we just need to add 3650 days ( roughly 10 yrs) # to the previous index. x[-1] gives the last value of the series. newindex = np.asarray(pd.RangeIndex(start=x[-1], stop=x[-1] + 3650)) # Prediction for future dates. Let's call it predicted values. y_predict = regression_model.predict(newindex.reshape(-1, 1)) #print the last predicted value print ("Closing price at 2029 would be around ", y_predict[-1]) #convert the days index back to dates index for plotting the graph x = pd.to_datetime(df.index, origin='1970-01-01', unit='D') future_x = pd.to_datetime(newindex, origin='1970-01-01', unit='D') #setting figure size from matplotlib.pylab import rcParams rcParams['figure.figsize'] = 20,10 #plot the actual data plt.figure(figsize=(16,8)) plt.plot(x,df['Close'], label='Close Price History') #plot the regression model plt.plot(x,y_learned, color='r', label='Mathematical Model') #plot the future predictions plt.plot(future_x,y_predict, color='g', label='Future predictions') plt.suptitle('Stock Market Predictions', fontsize=16) fig = plt.gcf() fig.canvas.set_window_title('Stock Market Predictions') plt.legend() plt.show()
What does this above image tell us?
X-axis of the graph shows the dates from year 1989 to year 2029 and the Y-axis shows the market closing price.
The graph in the blue color displays the close price history from year 1989 to year 2019. The graph in the red represents the best fit linear equation for the market data. The graph in the green is the future predictions.
What are the predictions ?
1. Market price after ten years would be around 39079.
2. In actual terms there will be 0% year on year growth in next ten years.
I know these predictions are not that accurate!! Because we have used linear regression with first degree polynomial equation y = mx + c. In reality the stock market goes with a lots of ups and downs. A linear equation is not capable enough to mimic the market behavior. We need something else, we need something accurate, we need polynomial regression here.
Another issue here is that we have used only one dimension that is time to predict the future. However, in reality stock market is governed by multiple dimensions which we must consider to predict most accurate results.
For now, we will play with only one dimension that is the time. In the next article I will show you how to use polynomial regression model and what are the predictions using that model. Till then happy predicating…..