Recurrent Neural Networks

Recurrent Neural Networks#

The neural networks we have investigated are sometimes called feedforward neural networks. These types of neural networks can be limited as the inputs are all processed independently by the network. The independence is addressed with recurrent neural networks, where the output of a neuron for a given data point is fed in as an input for the following data point. This makes recurrent neural networks uniquely suited for modelling sequential data, such as text, speech and time series. As a result, recurrent neural networks have significantly influenced the transformer networks that have led to the revolutionary large language models.

We will use a simple time series dataset for the electrical production to put together a simple recurrent neural network. You can download the dataset here.

import pandas as pd
import matplotlib.pyplot as plt

timeseries = pd.read_csv('../data/electric.csv')
timeseries['date'] = pd.to_datetime(timeseries['date'])

fig, ax = plt.subplots()
timeseries.plot(x='date', y='elec-prod', ax=ax)
ax.tick_params(axis='x', labelrotation=45)
plt.show()
../_images/e6e0f93d7f3a19cce247f471490c90a6537f4ba714f36f38ae73e10991599e4e.png

Feedback#

As mentioned above, the difference between a traditional neural network and a recurrent neural network is that the latter includes the hidden state of the previous data point in the activation function. That means that for some activation function, \(f\), the traditional network as the equation,

\[ h_i = f(W_xx_i + b_i), \]

while for a recurrent network, we add the previous hidden state,

\[ h_i = f(W_hh_{i-1} + W_xx_i + b). \]

We implement a simple feedback perceptron with Python below, using a tanh activation function.

import numpy as np

class SimpleRNNPerceptron:
    """
    A simple RNN perceptron with a single hidden layer.
    
    :param input_size: The size of the input vector
    :param hidden_size: The size of the hidden layer
    """
    def __init__(self, input_size, hidden_size):
        self.hidden_size = hidden_size

        self.W_x = np.random.randn(hidden_size, input_size) * 0.1
        self.W_h = np.random.randn(hidden_size, hidden_size) * 0.1
        self.b = np.zeros((hidden_size, 1))
        self.h_i = np.zeros((hidden_size, 1)) 

    def step(self, x_i):
        """
        Processes a single step

        :param x_i: The input vector
        :return: The output vector
        """
        x_i = x_i.reshape(-1, 1) 
        self.h_i = np.tanh(np.dot(self.W_x, x_i) + np.dot(self.W_h, self.h_i) + self.b)
        return self.h_i

rnn = SimpleRNNPerceptron(input_size=1, hidden_size=5)

We can run this through a single year of our time series data, where the hidden size is 5. The hidden size is the number of neurons in the hidden layer.

timeseries_2017 = timeseries['elec-prod'][timeseries['date'].dt.year == 2017]
timeseries_2017 /= timeseries_2017.max()

for i, x_i in enumerate(timeseries_2017):
    h_i = rnn.step(np.array([x_i])) 
    print(f"Step {i+1}: Hidden State = {h_i.ravel()}")
Step 1: Hidden State = [ 0.00589006  0.20614755 -0.04396368 -0.04361022  0.09026075]
Step 2: Hidden State = [ 0.01045484  0.1828243  -0.05634081 -0.06326329  0.07953521]
Step 3: Hidden State = [ 0.00859395  0.18702286 -0.05304004 -0.05654218  0.07370454]
Step 4: Hidden State = [ 0.0099536   0.16373959 -0.04884112 -0.05380353  0.06592301]
Step 5: Hidden State = [ 0.00859549  0.17024794 -0.04803106 -0.05168423  0.06784802]
Step 6: Hidden State = [ 0.00983018  0.18778578 -0.05272445 -0.05704393  0.07691414]
Step 7: Hidden State = [ 0.01037696  0.20579509 -0.05800958 -0.06264078  0.08431554]
Step 8: Hidden State = [ 0.01075281  0.20058813 -0.05824744 -0.06338376  0.08145325]
Step 9: Hidden State = [ 0.01005464  0.18251945 -0.05368026 -0.05849586  0.07279433]
Step 10: Hidden State = [ 0.00950314  0.17326675 -0.05021146 -0.05455615  0.06905931]
Step 11: Hidden State = [ 0.00947502  0.17953613 -0.05096214 -0.05515233  0.07246156]
Step 12: Hidden State = [ 0.01046753  0.21007359 -0.0582401  -0.06263817  0.08633191]

The larger the hidden size, the more capacity the network has to learn complex patterns, which comes with a computational cost.

Implementation with pytorch#

The Python library pytorch comes with an implementation of recurrent neural networks. We advise you to study the documentation to understand how a recurrent neural network may be implemented using pytorch. Furthermore, you should appreciate the configurability of this nn.Module object.