STOCK MARKET PREDICTION
Stock market prediction is the act of trying to determine the future value of a company stock or other financial instrument traded on an exchange. The successful prediction of a stock's future price could yeild significant profit. Stocks are basically an aquity investment that represents part ownership in a corporation or a company, it entitles you to part of that company's earnings and assets.
DATASET
The historical stock data is collected from the Google stock price and this historical data is used for the prediction of future stock prices. To build the stock market prediction model, we will use the Google Stock Price Train dataset. Click here to download the dataset.
#importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import datetime |
Next, what i'm going to do is read the dataset.
dataset = pd.read_csv("Google_Stock_Price_Train.csv", index_col="Date",parse_dates=True)
dataset.head() |
OUTPUT
Since i've used dataset.head() here you can see the top five rows. If I would have used dataset.tail() and run it you can see the bottom five.
So, next what i'm going to do is i'll check if any of my data is not applicable. This is any() function is used to detect the missing values, it returns a boolean same size object indicating if the values are not applicable.
OUTPUT
Open False
High False
Low False
Close False
Volume False
dtype: bool
|
Finally we're printing out the growth of the price stocks 2012 - 2017. As we're talking about the google and if i'm not wrong google spent companies so, its stock price rise by almost 85% between 2014 - 2017 going from about 820$ - 1519$ in three years.
dataset['Open'].plot(figsize=(16,6)) |
OUTPUT
Now the next thing we're interested in is What is a 7 day rolling mean of a stock price? - for every single stock prediction we look 7 days back collect all the transactions that fall in this range and get the average of a column. Luckily is extremely easy to achieve with pandas.
# 7 day rolling mean
dataset.rolling(7).mean().head(20) |
OUTPUT
Now, compare with the previous graph that we've get and rolling mean. This basically gives you the moving average of past 30 days.
dataset['Open'].plot(figsize=(16,6))
dataset.rolling(window=30).mean()['Close'].plot() |
OUTPUT
Let's try to plot the close column v/s the 7 day moving average of the close column.
dataset['Close: 30 Day Mean'] = dataset['Close'].rolling(window=30).mean()
dataset[['Close','Close: 30 Day Mean']].plot(figsize=(16,6)) |
OUTPUT
I also had a optioned of going ahead and specifying a minimum number of periods.
# Optional specify a minimum number of periods
dataset['Close'].expanding(min_periods=1).mean().plot(figsize=(16,6)) |
OUTPUT
And with that we're creating the our dataframe which is of the training set. and reading the content of the dataset using pandas.
training_set = dataset['Open']
training_set = pd.DataFrame(training_set) |
DATA PREPROCESSING: The pre-processing stage involves Data discretization, Data transformation, Data cleaning, Data integration. After the dataset is transformed into a clean datset, the dataset is divided into training and testing sets to evaluate. We're going to start out by cleaning our data we're doing the same thing which we've done before is checking if there is any not applicable possibilities.
# Data cleaning
dataset.isma().any() |
And then move on to feature scaling for which we're going to be importing MinMaxScaler from sklearn which is nothing but a machine learning library for python, we're using the MinMaxScaler to transform features by scaling each of them to set range.
# feature Scaling
from sklearn.preprocessing import MinMaxScaler
sc = MinMaxScaler(feature_range = (0,1))
training_set_scaled = sc.fit_transform(training_set) |
Then finally we're going to creating a data structure with 60 timesteps and 1 output, so basically what we're trying to do here is that we're basically going to take the data from day 1 - day 60 and then make prediction on the 61st day and then we're going to follow it up by taking data from day 2 - day 61 and then predict on 62nd day.
# Creating a data structure with 60 timesteps and 1 output
x_train = []
y_train = []
for i in range(60,1258):
x_train.append(training_set_scaled[i-60:i, 0])
y_train.append(training_set_scaled[i, 0])
x_train, y_train = np.array(x_train), np.array(y_train)
# Reshaping
x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1)) |
FEATURE EXTRACTION: In this layer , only the features which are to be fed to the neural network are chosen.
# Part 2 - Building the RNN
#Importing the Keras libraries and packages
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Dropout |
Next we're going to be initialising the RNN, so for a time series problem we're basically going to be using regression model, for a regression deep learning model. first step is to read in the data which is a sequential data and assigned to the model called regressor.
# Initialising the RNN
regressor = Sequential() |
TRAINING NEURAL NETWORK: In this stage, the data is fed to the neural network and trained for prediction assigning random biases and weights. Now this LSTM model is composed of a sequential input layer followed by three LSTM layers and a dense layer with activation and then finally a dense output layer with the linear activation functions.
# Adding the first LSTM layer and some Dropout regularisation
regressor.add(LSTM(units = 50, return_sequences = True, input_shape = (x_train.shape[1],1)))
regressor.add(Dropout(0,2))
# Adding the second LSTM layer and some Dropout regularisation
regressor.add(LSTM(units = 50, return_sequences = True))
regressor.add(Dropout(0,2))
# Adding the third LSTM layer and some Dropout regularisation
regressor.add(LSTM(units = 50, return_sequences = True))
regressor.add(Dropout(0,2))
# Adding the fourth LSTM layer and some Dropout regularisation
regressor.add(LSTM(units = 50))
regressor.add(Dropout(0,2))
# Adding the output layer
regressor.add(Dense(units = 1)) |
Next what we're going to do is compile our RNN.
# Compiling the RNN
regressor.compile(optimizer = 'adam', loss = 'mean_squared_error')
#fitting the RNN to the Training set
regressor.fit(x_train, y_train, epochs = 100, batch_size = 32) |
VISUALIZATION: A rolling analysis of a time series model is often used to assess the model's stability over time. When analyzing financial time series data using a statistical model, a key assumption is that the parameters of the model are constant over time.
# Part 3 - Making the prediction and visualising the results
# Getting the real stock price of 2017
dataset_test = pd.read_csv("Google_Stock_Price_Train.csv", index_col="Date",parse_dates=True) |
real_stock_price = dataset_test.iloc[:, 1:2].values |
OUTPUT
Here again reading the test set and putting it in a dataframe.
dataset_test["Volume"] = dataset_test["Volume"].str.replace(',', '').astype(float) |
test_set=dataset_test['Open']
test_set=pd.DataFrame(test_set) |
And finally to get predicted stock price of 2017 with the merged training set and test set on the 0th axis.
# getting the predicted stock price of 2017
dataset_total = pd.concat((dataset['Open'], dataset_test['Open']), axis = 0)
inputs = dataset_total[len(dataset_total) - len(dataset_test) - 60:].values
inputs = inputs.reshape(-1,1)
inputs = sc.transform(inputs)
x_test = []
for i in range(60,80):
x_test.append(inputs[i-60:i, 0])
x_test = np.array(x_test)
x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], 1))
predicted_stock_price = regressor.predict(x_test)
predicted_stock_price = sc.inverse_transform(predicted_stock_price) |
predicted_stock_price = pd.DataFrame(predicted_stock_price)
predicted_stock_price.info() |
OUTPUT
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 0 20 non-null float32
dtypes: float32(1)
memory usage: 208.0 bytes
|
Finally we're going to use matplotlib to visualize the results of the predicted stock and the real stock price.
#visualising the results
plt.plot(real_stock_price, color='red', label = 'Real Google Stock Price')
plt.plot(predicted_stock_price, color='blue', label = 'Predicted Google Stock Price')
plt.title('Google Stock Price Prediction')
plt.xlabel('Time')
plt.ylabel('Google Stock Price')
plt.legend()
plt.show() |
OUTPUT
Introduction about project developer
The Stock Prediction project is developed by Sakshi Singh from Maharani Girls Engineering College, Jaipur. This project is developed during Goeduhub online Summer training in Artificial Intelligence, Machine Learning and Deep learning.