All About Recurrent Neural Network (RNN)

When we are to ask to write lyrics of a song or sing a song from starting, it’s easy but given a middle line or a word, it becomes difficult and takes more time to recognise the song and sing. This becomes even difficult when we are asked to sing a song backwards. The reason is we haven’t practised it and we have learned or trained it as a sequence. That’s what Recurrent Neural Network is all about.

So for example we know B comes after A very easily but difficult for vice versa Memory matters when our data is a sequence of some kind. Sequential data are ordered data were related things follow each other. No doubt RNN is one the most popular neural network world.

Unlike feedforward neural networks, RNNs uses their internal state (memory) to process the series of inputs. Since RNN works on series data. This makes RNN applicable for tasks such as handwriting recognition[1] or speech recognition.

Difference between Neural Network and Recurrent Neural Network

The feed forward neural network has a hidden layer which gets input and produces output. The information moves from left to right through the network which makes impossible to touch a node twice.

Each row of input data is used to create the hidden layer (via forward propagation). Each hidden layer is then used to get the final output layer. Hidden layer is a combination of the input data and the previous hidden layer, which is called as memory.

Neural network predicts the data trained from previous inputs.Recurrent neural networks address the issue that normal neural network face. They are networks with loops in them, allowing information to persist and reside.This chain-like structure of RNN proves that they are intimately related to sequences and lists.

In RNN, the input data is itself looped back again to get better trained .Hidden state represent information from all other previous steps. The output of RNN is given as an input of feed forward network to predict. The final step contains the information from all other previous steps

Key points

An unrolled recurrent neural network.

  • RNN is use for modelling sequenced data

  • RNN are good at processing sequence data and for predictions. Examples for sequence data include audio, video, a English sentence etc.,

  • RNN’s have sequential memory

  • RNNs are robust type of neural network

  • It has a internal memory to store things

  • In a RNN the information travels through cycles using loops. Decision is made with the help of current input and previous inputs

  • Recurrent neural networks add the immediate past to the present.

  • RNNs apply weights to the current and also to the previous input.

Back propagation

Back propagation is basically going backwards through neural network in order to find the partial derivatives of error with respect to the weights, which enables you to subtract this value from the weights. Those derivatives are then used by gradient descent. Then it adjusts the weights up or down, depending on which decreases the error. That is exactly how a neural network learns during the training process. So, with back propagation you basically try to tweak the weights of your model while training.

Gradient descent

A gradient is a partial derivative with respect to the inputs. How much there is a change in output when input changes a little bit. It also acts as a slope of a function. More the gradient, the more steeper the slope and faster a model can learn. When the slope is zero, the model stops learning. A gradient measures the change in all weights with respect to the change in error.


Issue with RNN – Short term memory (vanishing gradient problem) . Gradients shrink exponentially as we back propagate, so as we further move backwards there is no learning. This is called vanishing gradient problem. Gradient is used to make adjustments in weights, Small gradient is small adjustments

RNN suffer from short term, to overcome this LSTM and GRU were used. They are solved through gates, which tells which are the to be considered for other layers


Exploding gradients assigns high importance to the weights. Fortunately, this problem can be easily solved by truncating the gradients.


Vanishing gradients takes place when the values of a gradient are too small and model stops learning or takes away more time. It is harder to solve than the exploding gradients. It was now solved through the concept of LSTM


  • A recurrent neural network (RNN) used in speech recognition, language translation, stock prediction and to display the contents in the pictures

  • RNN’s are used by Google voice search and apple’s siri.

Advantages of Recurrent Neural Network

  • An RNN remembers information through time.

  • It is useful in time series prediction only because of the feature to remember previous inputs as well. This is called Long Short Term Memory.

  • Recurrent neural network are even used with convolution layers to extend the effective pixel neighbourhood.

Disadvantages of Recurrent Neural Network

  • Gradient vanishing and exploding problems.

  • Training an RNN is a very difficult task.

  • It cannot process very long sequences if using tanh or relu as an activation function.

  • Back propagation through time (BPTT) is a gradient-based technique for training certain types of RNN.


Now you have proper understanding of how a Recurrent Neural Network works, which enables you to decide if it is the right algorithm to use for a given Machine Learning problem. You can also check our post on Neural Style Transfer

Spread the knowledge

Aswath Rao

Currently pursuing Msc in Data Science

Leave a Reply

Your email address will not be published. Required fields are marked *