Stock market prediction using neural network algorithm

0 / 320
Entire script of the famous show Game of Thrones revolves around the arrival of the winter. From the season one to the season seven, the characters there continuously prepare themselves for the winter. Finally the winter came in the last season and it lasted for only few episodes of that season. This has disappointed many GOT fans including me. We wanted to enjoy that entertainment for a few more seasons and hence hoping for a complete season dedicated to the winter. However, that was a drama but in reality we would have also prayed for the shortest winter season if ever we had to face it.   The winter of the Westeros has taught us an important lesson, no matter how short the winter is but it is always devastating. The winter brings into play all monstrous characters like the Night King, the Mad Queen. Winterfell was destroyed, King’s Landing was burnt into ashes, millions of people died. However, in the end it gave rise to the new kingdom and the new hopes. This drama has a lot more similarities with the economic world.   The recession is coming for all of us and this time it is for real.  Do you want to sit back and wait for its arrival or do you want to prepare yourself for it. The decision is yours. Yes, some time is still left.   In earlier days, only the economists and the pundits used to predict the recession. And we, the  common people, came to know about it only after its arrival. We would believe in it only after losing jobs and losing money invested in stocks. We have heard the story of one of the former governor of RBI, how he predicted the 2008 recession in his paper written in 2005. We could also see so many economists boasting on the internet about how they were the firsts who predicted the next recession.  With all due respect to their knowledge and their abilities, now the time has changed and we the common people with the help of machine learning algorithms can also forecast the next recession!!  We are going to see that now !!      In the previous three articles we tried various regression techniques and the time series forecasting algorithms like the Prophet from the Facebook. However, we realized that the stock market problem is too complex to be solved by all these techniques. Hence, in this article we are exploring the neural networks to find out if it could do the realistic predictions.    A neural network is a network or circuit of neurons, or in a modern sense, an artificial neural network, composed of artificial neurons or nodes. Thus a neural network is either a biological neural network, made up of real biological neurons, or an artificial neural network, for solving artificial intelligence (AI) problems.    Unlike von Neumann model computations, artificial neural networks do not separate memory and processing and operate via the flow of signals through the net connections, somewhat akin to biological networks.   These artificial networks may be used for predictive modelling, adaptive control and applications where they can be trained via a dataset. Self-learning resulting from experience can occur within networks, which can derive conclusions from a complex and seemingly unrelated set of information.   Let’s try to understand a neuron, a neural network and a neural network computing algorithm using a logical AND gate. Following diagram is a representation of a logical neuron. It has two inputs x1 and x2 which accepts binary values i.e either 0 or 1. The neuron has one biased node +1. Let’s say our target function is y = x1 AND x2. 
In this neural network of having just a single neuron; our hypothesis function is hɵ (x) = g ( w1 * 1+ w2 * x1 + w3 *  x2).  Here g(x) is a sigmoid function which produces 1 if x is positive and 0 if x is negative. In C language this can be written as “int g(x) { return (x > 0 ? 1 : 0);}”    In the beginning of the computation, we need to have a design of our neural network ready and we also need a predefined hypothesis function which is generally a summation of product of input values and the weights associated with them. Initially the weights w1, w2 and w3 are unknowns.   In the next step we train the neural network by feeding it sample data. In our case to train it to behave  like an AND gate we will feed it following data.
No alt text provided for this image
Now, the job of the neural network computing algorithm is to find out the proper values for the weights w1, w2 and w3 so that the hypothesis function will produce the desired output. In our case it should behave like an AND gate. After analyzing the input data the algorithm may come up with w1 = -3 , w2 = 2 and w3 = 2 values.  Now, the hypothesis function becomes hɵ (x) = g ( -3 + 2 * x1 + 2 * x2). Now, let’s see what this little single neuron network will compute. Let’s look at the four possible input values for x1 and x2 and look at what the hypotheses will output in that case. If x1 and x2 are both equal to 0 then hɵ (x) = g (-3 + 2 * 0 + 2 * 0) = g(-3) and g (-3) will produce 0. If either of x1 or x2 is equal to 0 then hɵ (x) = g(-1) and g(-1) will produce 0. When both x1 and x2 are equal to 1 then the hypothesis will be hɵ (x) = g (-3 + 2 * 1 + 2 * 1) = g(1) and g(1) will produce 1. If you look at the output hɵ (x) column in the figure below you can see that this is exactly the behavior of a logical AND function.
No alt text provided for this image
Here, we have constructed one of the fundamental operations in computers by using a small neural network rather than using an actual AND gate. Neural networks can also be used to simulate all the other logical gates. The same neuron can be trained to behave like a logical OR gate. In that case the computed weights may be w1 = -1, w2 = 2, and w3 = 2. The hypothesis will be hɵ (x) = g ( -1 + 2 * x1 + 2 * x2). If you write a truth table for this new hypothesis function then you will find that it is exactly behaving like a logical OR gate. Also read: Stock market prediction – Part II & Stock market prediction using linear regression Now, let’s get back to our riddle of stock market prediction. In this exercise we are going to explore a neural network algorithm called “LSTM” . Long short-term memory (LSTM) is an artificial recurrent neural network (RNN) architecture used in the field of deep learning. Unlike standard feed forward neural networks, LSTM has feedback connections that make it a “general purpose computer” (that is, it can compute anything that a Turing machine can).   A common LSTM unit is composed of a cell, an input gate, an output gate and a forget gate. The cell remembers values over arbitrary time intervals and the three gates regulate the flow of information into and out of the cell.   LSTM networks are well-suited to classifying, processing and making predictions based on time series data, since there can be lags of unknown duration between important events in a time series.   Unlike the previously tried time series prediction algorithms where we directly fed the stock market data to the algorithm, in this case for LSTM we need some preprocessing of the stock market data. The input dataset becomes the prediction boundary for the LSTM. While making predictions It cannot come up with figures which are outside of the input dataset. For example, if the lowest value in the input dataset is 600 points and the highest value is 39000 points then the future predictions will be in the range of 600 to 39000 only. If we are doing forecast for next ten years then this will never be the case as in long run the stock market always goes up. Hence, we have to transform the stock market data in the form of relative increase and decrease percentage with respect to the previous close value. By doing this we have nullified the range boundary. As we are playing with the relative data now, virtually we do not have any boundary. Now the LSTM can make real predictions using the relative percentage data. Let’s understand this with the help of following example.   Table A shows the stock market dataset. Table B is the transformed version of table A.
No alt text provided for this image
For the first item 1-Jan as we do not have any previous value to compare with hence the relative percentage change is 0%. For 2-Jan the current close is 102 points, the previous close was 100 points hence the relative percentage change is +2%. For 4-Jan the current close is 103 points and previous close was 105 points hence the relative percentage change is -1.93%. Similarly we have calculated values for all the other entries.   Now, let’s try this neural network solution on the real data. We can download it from BSE India – Archives . I have downloaded the data from year 1989 to the current date. The downloaded data sheet has total five columns; Date, Open, High, Low and Close. For simplicity we are deleting three columns Open, High and Low from the data sheet and keeping only Date and Close columns.    Following is the complete python code. The code is commented enough to explain each and every aspect of this problem. Copy paste it on your python IDE and run it to see the predictions in graphical format.
#importing required libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from keras.models import Sequential
from keras.layers import Dense, Dropout, LSTM
#to plot within notebook
import matplotlib.pyplot as plt

# function to calculate percentage difference considering baseValue as 100%
def percentageChange(baseValue, currentValue):
    return((float(currentValue)-baseValue) / abs(baseValue)) *100.00

# function to get the actual value using baseValue and percentage
def reversePercentageChange(baseValue, percentage):
    return float(baseValue) + float(baseValue * percentage / 100.00)

# function to transform a list of values into the list of percentages. For calculating percentages for each element in the list
# the base is always the previous element in the list.
def transformToPercentageChange(x):
    baseValue = x[0]
    x[0] = 0
    for i in range(1,len(x)):
        pChange = percentageChange(baseValue,x[i])
        baseValue = x[i]
        x[i] = pChange

# function to transform a list of percentages to the list of actual values. For calculating actual values for each element in the list
# the base is always the previous calculated element in the list.
def reverseTransformToPercentageChange(baseValue, x):
    x_transform = []
    for i in range(0,len(x)):
        value = reversePercentageChange(baseValue,x[i])
        baseValue = value
    return x_transform

#read the data file
df = pd.read_csv('D:\\python3\\data\\SensexHistoricalData.csv')
# store the first element in the series as the base value for future use.
baseValue = df['Close'][0]

# create a new dataframe which is then transformed into relative percentages
data ​= df.sort_index(ascending=True, axis=0)
new_data = pd.DataFrame(index=range(0,len(df)),columns=['Date', 'Close'])
for i in range(0,len(data)):
    new_data['Date'][i] = data['Date'][i]
    new_data['Close'][i] = data['Close'][i]

# transform the 'Close' series into relative percentages

# set Dat column as the index
new_data.index = new_data.Date
new_data.drop('Date', axis=1, inplace=True)

# create train and test sets
dataset = new_data.values
train, valid = train_test_split(dataset, train_size=0.99, test_size=0.01, shuffle=False)

# convert dataset into x_train and y_train.
# prediction_window_size is the size of days windows which will be considered for predicting a future value.
prediction_window_size = 60
x_train, y_train = [], []
for i in range(prediction_window_size,len(train)):
x_train, y_train = np.array(x_train), np.array(y_train)
x_train = np.reshape(x_train, (x_train.shape[0],x_train.shape[1],1))

# create and fit the LSTM network
# Initialising the RNN
model = Sequential()
# Adding the first LSTM layer and some Dropout regularisation
model.add(LSTM(units = 50, return_sequences = True, input_shape = (x_train.shape[1], 1)))

# Adding a second LSTM layer and some Dropout regularisation
model.add(LSTM(units = 50, return_sequences = True))

# Adding a third LSTM layer and some Dropout regularisation
model.add(LSTM(units = 50, return_sequences = True))

# Adding a fourth LSTM layer and some Dropout regularisation
model.add(LSTM(units = 50))

# Adding the output layer
model.add(Dense(units = 1))
# Compiling the RNN
model.compile(optimizer = 'adam', loss = 'mean_squared_error')

# Fitting the RNN to the Training set, y_train, epochs = 100, batch_size = 32)


#predicting future values, using past 60 from the train data
# for next 10 yrs total_prediction_days is set to 3650 days
total_prediction_days = 3650
inputs = new_data[-total_prediction_days:].values
inputs = inputs.reshape(-1,1)

# create future predict list which is a two dimensional list of values.
# the first dimension is the total number of future days
# the second dimension is the list of values of prediction_window_size size
X_predict = []
for i in range(prediction_window_size,inputs.shape[0]):
X_predict = np.array(X_predict)

# predict the future
X_predict = np.reshape(X_predict, (X_predict.shape[0],X_predict.shape[1],1))
future_closing_price = model.predict(X_predict)

train, valid = train_test_split(new_data, train_size=0.99, test_size=0.01, shuffle=False)
date_index = pd.to_datetime(train.index)

#converting dates into number of days as dates cannot be passed directly to any regression model
x_days = (date_index - pd.to_datetime('1970-01-01')).days

# we are doing prediction for next 5 years hence prediction_for_days is set to 1500 days.
prediction_for_days = 1500
future_closing_price = future_closing_price[:prediction_for_days]

# create a data index for future dates
x_predict_future_dates = np.asarray(pd.RangeIndex(start=x_days[-1] + 1, stop=x_days[-1] + 1 + (len(future_closing_price))))
future_date_index = pd.to_datetime(x_predict_future_dates, origin='1970-01-01', unit='D')

# transform a list of relative percentages to the actual values
train_transform = reverseTransformToPercentageChange(baseValue, train['Close'])

# for future dates the base value the the value of last element from the training set.
baseValue = train_transform[-1]
valid_transform = reverseTransformToPercentageChange(baseValue, valid['Close'])
future_closing_price_transform = reverseTransformToPercentageChange(baseValue, future_closing_price)

# recession peak date is the date on which the index is at the bottom most position.
recessionPeakDate =  future_date_index[future_closing_price_transform.index(min(future_closing_price_transform))]
minCloseInFuture = min(future_closing_price_transform);
print("The stock market will reach to its lowest bottom on", recessionPeakDate)
print("The lowest index the stock market will fall to is ", minCloseInFuture)

# plot the graphs
df_x = pd.to_datetime(new_data.index)
plt.plot(date_index,train_transform, label='Close Price History')
plt.plot(future_date_index,future_closing_price_transform, label='Predicted Close')

# set the title of the graph
plt.suptitle('Stock Market Predictions', fontsize=16)

# set the title of the graph window
fig = plt.gcf()
fig.canvas.set_window_title('Stock Market Predictions')

#display the legends

#display the graph
    Stock market prediction using neural networks     How to read the above image? X-axis of the graph shows the dates from year 1989 to year 2025 and the Y-axis shows the market closing price. The graph in the blue color displays the close price history from year 1989 to year 2019. The graph in the orange represents the future predictions from year 2019 to year 2025.      What are the predictions ?   1. The lowest index the stock market will fall to is 17,574. 2. On 18th December 2020 the stock market will reach to its lowest bottom. 3. The economy will be in recession from the second quarter of 2020. 4. The stock market will take another five years to recover from the recession.   5. Forget about the returns from the stock market just worry about safeguarding your investments!!     Prediction graph analysis:   After looking at the predictions I am happy and worried now. I am happy because we could find out a technique using ML algorithms to predict a recession. I am worried because we will really have to face this scary phenomenon.   Unlike the predictions done earlier using various regression techniques the predictions done using LSTM algorithm seems realistic. The orange colored prediction graph is as dynamic as the blue colored real life graph. Now, we have cracked the most difficult part of the stock market prediction riddle by identifying a right algorithm. The next challenge is to fine tune it for more accurate predictions.   All these predictions are done using only one dimension that is the Time. As wise men say the Time has answers for all the queries. Here, we used  Time to predict the coming of the next recession. But let’s keep the philosophy aside and come to the reality. And the reality is that there are many dimensions including the Time which govern the stock market behavior. We must not ignore them. The above predictions might not be accurate, the dates may differ but the bottom line won’t change. The bottom line is that the recession is coming.    The world had generated the most wealth in the decades of 80s , 90s and the early 2000s and then came our generation who started earning in the decade of 2010. We all are aware of the lackluster performance of the stock market in the last decade and all of our predictions are showing us a gloomy picture ahead. Our generation is cursed!!, we have started earning in the trough of a long term economic cycle. Let’s accept, we cannot generate wealth using the stock market now. We should find out some other ways. May be something shiny, something glittery might make us wealthy. May be the stocks we own need a golden touch to make us rich. Yes, a golden touch. But remember the golden touch was a curse for the mighty king Midas. However, can it be a boon for us? Let’s find out that curse in the next article “The Golden Touch”. Till then happy predicting…..     References:  



14+ years of professional experience in developing products like Siemens Teamcenter, Siemens Product Master Manager, Intuit Quicken and Kosmix using Python, C++, JAVA, Oracle SQL. Pythonist | Python Aficionado | Machine Learning Enthusiast | Freelancer

Related Posts