Stock market prediction helps us to estimate the future value of a company stock. Why is stock market prediction necessary? The answer is quite simple though. The successful prediction of a stock’s future price will help us to yield good profit. The stock prices essentially reflect all currently available information and also, any inherently unpredictable price changes. They are indeed really essential to get an appreciable profit.
Let us see how we can implement Stock Market Prediction using Machine learning and python.m We will be walking step by step through the entire code demonstrating how to go about with it. Let’s roll!
Stock Prediction using LSTM
Import the necessary python libraries that are initially required into our project. The libraries are:
- Numpy: NumPy is a Numerical Python library that supports large, multi-dimensional arrays and matrices.
- Matplotlib: Matplotlib is the numerical mathematics extension NumPy and a plotting library for Python.
- Pandas: Pandas is one of the best python libraries for data manipulation and analysis.
- Datetime: Datetime is a library that is used to deal with dates and time.
So let us import them! We might need a few more libraries. We will get a hand on them as we go further.
The dataset that we are going to use here is Google_Stock_Price_Train.csv. The dataset has 6 columns namely,
We will read it into the variable datasets as a csv file using pandas.
Just for the sake of knowing and understanding our dataset, let us see what is there in the first 5 rows of the dataset.
We need to make sure that each of our data is applicable in our context. So let us search for any not applicable data in our dataset.
Now let us visualise our dataset. Let us plot our data from 2012 to 2017 using the matplotlib that we have imported to get a better idea about the statistic of our dataset.
An important step in our project is to homogenize our data. For that we need to know what are the datatypes of the data we have in our dataset. Let us find it out using the ‘info’ function.
Now let us homogenize it. From the above output, it is quite clear that the datatype of the first 4 columns is ‘float’ while the next 2 is ‘object’. We need to convert them too into ‘float’. Let us do that.
Next step is to find the 7 Day rolling mean of our data. If you are wondering what a ‘7 Day Rolling mean’ is, it is the average/mean of data of the past 7 days of a particular transaction that we are dealing with. Let us display the same for the first 20 data in our dataset.
Now let us create our training data set using pandas. We are all set for our preprocessing. Also just cross check if there some not applicable data still left, This is just to double check! Let us move on.
We are going to preprocess our data. We are using Scikit learn for preprocessing. From Scikit learn, we import the MinMaxScalar. It transforms features by scaling each feature to a given range.This estimator scales and translates each feature individually such that it is in the given range on the training set.Here we have chosen the range to be between zero and one.
We create a data structure with 60 timestamps and a single output. We are actually looking into data of the past 60 days and predicting the very next day’s output. Let us see how we can implement it using a for loop. Don’t forget to reshape the result that you obtained!
Let us perform Feature extraction on our data. Basically we are going to extract what we actually need for our model. We are going to start building our RNN. We need to import certain libraries from keras to proceed. Keras is tensorflow’s high level API for building and training machine learning models. Go forward and import them.
Initialise our model as sequential. It is basically a linear stack of layers through which you can create a sequential model by passing a list through it.
We move on to the most crucial part of our machine learning model. By assigning random weights and bias, we try and train our model in a sequential model. We are going to be using LSTM for our model. Long short-term memory(LSTM) is an artificial RNN architecture used in the field of deep learning. Unlike standard feedforward neural networks, LSTM has feedback connections also. LSTM not only process single data points, but also entire sequences of data. Let us now declare various layers in our model using LSTM. We are also using dropout here in order to avoid overfitting. We don’t want the machine to just byheart the data. The output layer is going to be dense though and units=1 as we require only one output. Let’s code.
Let us compile and fit our model. We have chosen Adam(Adaptive Movement Estimator) Optimiser here. Feel free to explore more optimization techniques. Our loss is ‘mean squared error’. Number of epochs here is set to be 100 and the batch size is 32. Number of epochs is basically the number of times the entire training dataset is trained. Batch size refers to the number of training examples utilized in a single iteration. Let us run the code.
Having trained our model, let us test it on our testing dataset. Lets load the testing dataset. We can see what our training data is now and the information about the data types too. Let’s do it.
We have to replace object data type variables to float ones. And also, load our test set using pandas.
Setting the timestamp as 60, we reshape the data and make it into the desired form. Then we predict the testing data using the model that we have made. Let us dive into the code.
Lastly, let us analyse our predicted data and visualize it.
Hurray! We are up with an amazing model for our stock prediction. You can even customize the weights and various hyperparameters according to your wish. Feel free to play with the data. Train your model using various other techniques too and compare the accuracies. Find which one suits your data the best. Explore more!
Code Credits: Edureka!