An Introduction to Recurrent Neural Networks
Feedforward neural networks are great and help computers predict many useful things for us humans. Be it medical diagnosis from images or whether it’s worth going outside, they’re capable of things devices ordinarily can’t do.
However, there is a downside to them. They’re not optimized for sequential data. Sequential data is used often in fields like natural language processing, so this is an issue that needs to be overcame.
For example, imagine you had to predict the next word in this sequence of words. Or, put simply, a sentence.
“The rain poured down heavily, the cement paving now ___”
If you had to guess the word just from “now ___”, it’d be extremely difficult to predict something that made contextual sense.
For example, “now dry” makes grammatical sense on its own, but “The rain poured down heavily, the cement paving now dry” probably isn’t what we’re looking for.
So how do we fix this?
Introducing Recurrent Neural Networks!
With RNNs, we can teach our neural network to look at the entire sentence, not just the word “now”. This is because it’s able to work with sequential data.
Now, when inputted “The rain poured down heavily, the cement paving now __”, it could recognize the word “rain” and predict the blank word as “wet”.
How do they do this?
The architecture of an RNN is similar to that of a feedforward, but the major difference is the use of loops.
What this means is that, when our blue node is provided some input, not only does it produce an output, but it remembers prior inputs.
Let’s unroll this chunk for further understanding.
Here, what’s shown is that every input into ‘A’ is looped back into A afterward. This means that the third ‘A’ node is receiving 3 inputs: X0, X1, and X2.
This actually just is an expansion of the leftmost node with the loop. In this case, it would receive 3 different inputs and look at all of them to make the output h.
Let’s go back to our example of rain. In an RNN, each word in “The rain poured down heavily, the cement paving now” is treated as an input.
After each of these inputted words has been looped, our ‘A’ node can now understand what the sentence is saying and make an accurate prediction.
The Problem with Long-Term Dependencies
There is an issue, however, with recurrent neural networks.
We had a simple sentence in our example, but what if we were dealing with an entire paragraph? What if we discussed, for 1000 words, the history of cement, and then tried to get our RNN to predict that the cement was wet from the first sentence about rain?
This would be very hard, and virtually impossible for RNNs because they can’t keep track of all of this data. This issue is referred to as short-term memory.
Why does this happen?
All of this is caused by the vanishing gradient problem. As the RNN processes more inputs, it has trouble retaining the previous inputs. This is because of the process, the way we train our neural networks.
As we train the weights of our nodes to minimize loss/error, they’re adjusted proportionally to the nodes before it. This means that if the weights are barely adjusted, the ones after it will be adjusted even less.
Eventually, as we get to the beginning of our neural network, the weight adjustments would become so small that the weights of the initial inputs become brittle. As such, our RNN “forgets” them.
The triangle represents our gradient. Small gradients lead to small adjustments, causing our initial nodes to not learn.
We can solve this issue with a different type of RNN architecture called LSTMs, standing for long short term memory. Stay tuned for an article on them!
Key Takeaways
- RNNs enable the use of sequential data
- Inputs are looped so that nodes can retain prior information
- Due to the vanishing gradient problem, RNNs face “memory loss”
- The solution to this issue are LSTMs, which will be covered later on!
If you enjoyed this article, feel free to reach out to me on LinkedIn or at joshuapayne1275@gmail.com. Thanks for reading!