Stock Prediction using Machine Learning and Python

Question

Stock Prediction using Machine Learning and Python

asked Jun 12, 2020 in Computer Science and Engineering - Information Technologies by sakshi singh (117 points)

Machine learning has significant applications in the stock price prediction. In this machine learning project, we will be talking about predicting the returns on stocks. This is a very complex task and has uncertainties. We will learn how to predict stock price using the LSTM neural network.

1 Answer

answered Jun 14, 2020 by sakshi singh (117 points)
selected Aug 9, 2020 by Goeduhub

Best answer

STOCK MARKET PREDICTION

Stock market prediction is the act of trying to determine the future value of a company stock or other financial instrument traded on an exchange. The successful prediction of a stock's future price could yeild significant profit. Stocks are basically an aquity investment that represents part ownership in a corporation or a company, it entitles you to part of that company's earnings and assets.

DATASET

The historical stock data is collected from the Google stock price and this historical data is used for the prediction of future stock prices. To build the stock market prediction model, we will use the Google Stock Price Train dataset. Click here to download the dataset.

#importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import datetime

Next, what i'm going to do is read the dataset.

dataset = pd.read_csv("Google_Stock_Price_Train.csv", index_col="Date",parse_dates=True)
dataset.head()

OUTPUT

Since i've used dataset.head() here you can see the top five rows. If I would have used dataset.tail() and run it you can see the bottom five.

So, next what i'm going to do is i'll check if any of my data is not applicable. This is any() function is used to detect the missing values, it returns a boolean same size object indicating if the values are not applicable.

dataset.isna().any()

OUTPUT

Open      False
High      False
Low       False
Close     False
Volume    False
dtype: bool

Finally we're printing out the growth of the price stocks 2012 - 2017. As we're talking about the google and if i'm not wrong google spent companies so, its stock price rise by almost 85% between 2014 - 2017 going from about 820$ - 1519$ in three years.

dataset['Open'].plot(figsize=(16,6))

OUTPUT

Now the next thing we're interested in is What is a 7 day rolling mean of a stock price? - for every single stock prediction we look 7 days back collect all the transactions that fall in this range and get the average of a column. Luckily is extremely easy to achieve with pandas.

# 7 day rolling mean
dataset.rolling(7).mean().head(20)

OUTPUT

Now, compare with the previous graph that we've get and rolling mean. This basically gives you the moving average of past 30 days.

dataset['Open'].plot(figsize=(16,6))
dataset.rolling(window=30).mean()['Close'].plot()

OUTPUT

Let's try to plot the close column v/s the 7 day moving average of the close column.

dataset['Close: 30 Day Mean'] = dataset['Close'].rolling(window=30).mean()
dataset[['Close','Close: 30 Day Mean']].plot(figsize=(16,6))

OUTPUT

I also had a optioned of going ahead and specifying a minimum number of periods.

# Optional specify a minimum number of periods
dataset['Close'].expanding(min_periods=1).mean().plot(figsize=(16,6))

OUTPUT

And with that we're creating the our dataframe which is of the training set. and reading the content of the dataset using pandas.

training_set = dataset['Open']
training_set = pd.DataFrame(training_set)

DATA PREPROCESSING: The pre-processing stage involves Data discretization, Data transformation, Data cleaning, Data integration. After the dataset is transformed into a clean datset, the dataset is divided into training and testing sets to evaluate. We're going to start out by cleaning our data we're doing the same thing which we've done before is checking if there is any not applicable possibilities.

# Data cleaning
dataset.isma().any()

And then move on to feature scaling for which we're going to be importing MinMaxScaler from sklearn which is nothing but a machine learning library for python, we're using the MinMaxScaler to transform features by scaling each of them to set range.

# feature Scaling
from sklearn.preprocessing import MinMaxScaler
sc = MinMaxScaler(feature_range = (0,1))
training_set_scaled = sc.fit_transform(training_set)

Then finally we're going to creating a data structure with 60 timesteps and 1 output, so basically what we're trying to do here is that we're basically going to take the data from day 1 - day 60 and then make prediction on the 61st day and then we're going to follow it up by taking data from day 2 - day 61 and then predict on 62nd day.

# Creating a data structure with 60 timesteps and 1 output
x_train = []
y_train = []
for i in range(60,1258):
x_train.append(training_set_scaled[i-60:i, 0])
y_train.append(training_set_scaled[i, 0])
x_train, y_train = np.array(x_train), np.array(y_train)

# Reshaping
x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1))

FEATURE EXTRACTION: In this layer , only the features which are to be fed to the neural network are chosen.

# Part 2 - Building the RNN

#Importing the Keras libraries and packages
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Dropout

Next we're going to be initialising the RNN, so for a time series problem we're basically going to be using regression model, for a regression deep learning model. first step is to read in the data which is a sequential data and assigned to the model called regressor.

# Initialising the RNN
regressor = Sequential()

TRAINING NEURAL NETWORK: In this stage, the data is fed to the neural network and trained for prediction assigning random biases and weights. Now this LSTM model is composed of a sequential input layer followed by three LSTM layers and a dense layer with activation and then finally a dense output layer with the linear activation functions.

# Adding the first LSTM layer and some Dropout regularisation
regressor.add(LSTM(units = 50, return_sequences = True, input_shape = (x_train.shape[1],1)))
regressor.add(Dropout(0,2))

# Adding the second LSTM layer and some Dropout regularisation
regressor.add(LSTM(units = 50, return_sequences = True))
regressor.add(Dropout(0,2))

# Adding the third LSTM layer and some Dropout regularisation
regressor.add(LSTM(units = 50, return_sequences = True))
regressor.add(Dropout(0,2))

# Adding the fourth LSTM layer and some Dropout regularisation
regressor.add(LSTM(units = 50))
regressor.add(Dropout(0,2))

# Adding the output layer
regressor.add(Dense(units = 1))

Next what we're going to do is compile our RNN.

# Compiling the RNN
regressor.compile(optimizer = 'adam', loss = 'mean_squared_error')

#fitting the RNN to the Training set
regressor.fit(x_train, y_train, epochs = 100, batch_size = 32)

VISUALIZATION: A rolling analysis of a time series model is often used to assess the model's stability over time. When analyzing financial time series data using a statistical model, a key assumption is that the parameters of the model are constant over time.

# Part 3 - Making the prediction and visualising the results

# Getting the real stock price of 2017
dataset_test = pd.read_csv("Google_Stock_Price_Train.csv", index_col="Date",parse_dates=True)

real_stock_price = dataset_test.iloc[:, 1:2].values

dataset_test.head()

OUTPUT

Here again reading the test set and putting it in a dataframe.

dataset_test["Volume"] = dataset_test["Volume"].str.replace(',', '').astype(float)

test_set=dataset_test['Open']
test_set=pd.DataFrame(test_set)

And finally to get predicted stock price of 2017 with the merged training set and test set on the 0th axis.

# getting the predicted stock price of 2017
dataset_total = pd.concat((dataset['Open'], dataset_test['Open']), axis = 0)
inputs = dataset_total[len(dataset_total) - len(dataset_test) - 60:].values
inputs = inputs.reshape(-1,1)
inputs = sc.transform(inputs)
x_test = []
for i in range(60,80):
x_test.append(inputs[i-60:i, 0])
x_test = np.array(x_test)
x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], 1))
predicted_stock_price = regressor.predict(x_test)
predicted_stock_price = sc.inverse_transform(predicted_stock_price)

predicted_stock_price = pd.DataFrame(predicted_stock_price)
predicted_stock_price.info()

OUTPUT

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   0       20 non-null     float32
dtypes: float32(1)
memory usage: 208.0 bytes

Finally we're going to use matplotlib to visualize the results of the predicted stock and the real stock price.

#visualising the results
plt.plot(real_stock_price, color='red', label = 'Real Google Stock Price')
plt.plot(predicted_stock_price, color='blue', label = 'Predicted Google Stock Price')
plt.title('Google Stock Price Prediction')
plt.xlabel('Time')
plt.ylabel('Google Stock Price')
plt.legend()
plt.show()

OUTPUT

Introduction about project developer

The Stock Prediction project is developed by Sakshi Singh from Maharani Girls Engineering College, Jaipur. This project is developed during Goeduhub online Summer training in Artificial Intelligence, Machine Learning and Deep learning.

Books	Online Courses	Free Tutorials	Go to Your University	Placement Preparation
Latest:-	Important tips for Campus Placements

Stock Prediction using Machine Learning and Python

Please log in or register to answer this question.

1 Answer

STOCK MARKET PREDICTION

Introduction about project developer

The Stock Prediction project is developed by Sakshi Singh from Maharani Girls Engineering College, Jaipur. This project is developed during Goeduhub online Summer training in Artificial Intelligence, Machine Learning and Deep learning.

Please log in or register to add a comment.

Related questions