Recurrent Neural Networks

03 Jan 2019

LSTM

RNNs suffer from memory problems. The solution is to use gated neuron units, i.e, neurons that have a complex systems of masks/filters/… that process the information flowing in. A classical first read on the subject is Colah’s blog post. The main difference with a typical cell is the presence, in addition to the input and the hidden state, of a cell state, intended to capture the long-term memory of the network. That cell state is then modified through 3 gates:

forget gate layer: applies a pointwise multiplication to the cell state with the output of a sigmoid, therefore deciding what values of the cell state to let flow.
input gate layer: same step as above except that this will be applied to the output of a tanh cell, then added (pointwise) to the output of the forget gate layer.
output gate layer: decides what is output as a hidden state. The cell state is untouched after the forget and input gate layers, but the hidden state will be passed through a tanh (between -1 and 1), then filtered by a sigmoid (pointwise multliplication). So in short, a LSTM unit decides what to keep from the cell state, how to update some of the entries of the cell state, then from that state state what to output as the hidden state.

There exists of course variants:

peepholes: each or some of the gates can have direct access to the cell state
couple forget & input: only update (input) the entries you forgot (weights for forget and input are one minus the other)
GRU: It combines the above idea (combine forget and input) with the
combination of cell state and hidden state.

[ deeplearning rnn lstm ]

Fourre-tout

Recurrent Neural Networks

LSTM

Related Posts

Webdev 101 03 Jan 2022

Goodhart's law 06 Oct 2021

Notes on the Transformer architecture 17 Sep 2021