Understanding Simple Recurrent Neural Networks in Keras

Last Updated on January 6, 2023

This tutorial is designed for anyone looking for an understanding of how recurrent neural networks (RNN) work and straightforward strategies to make use of them via the Keras deep finding out library. While the Keras library provides the entire methods required for fixing points and developing features, it is also important to appreciate an notion into how each factor works. In this textual content, the computations happening inside the RNN model are confirmed step-by-step. Next, a complete end-to-end system for time assortment prediction is developed.

After ending this tutorial, you will know:

The development of an RNN
How an RNN computes the output when given an enter
How to rearrange info for a SimpleRNN in Keras
How to educate a SimpleRNN model

Kick-start your problem with my e ebook Building Transformer Models with Attention. It provides self-study tutorials with working code to info you into developing a fully-working transformer model which will
translate sentences from one language to a unique…

Let’s get started.

Understanding simple recurrent neural networks in Keras. Photo by Mehreen Saeed, some rights reserved.

Tutorial Overview

This tutorial is break up into two elements; they’re:

The development of the RNN
1. Different weights and biases associated to completely completely different layers of the RNN
2. How computations are carried out to compute the output when given an enter
A complete software program for time assortment prediction

Prerequisites

It is assumed that you have a elementary understanding of RNNs sooner than you start implementing them. An Introduction to Recurrent Neural Networks and the Math That Powers Them provides you a quick overview of RNNs.

Let’s now get correct proper all the way down to the implementation half.

Import Section

To start the implementation of RNNs, let’s add the import half.

from pandas import read_csv<br />import numpy as np<br />from keras.fashions import Sequential<br />from keras.layers import Dense, SimpleRNN<br />from sklearn.preprocessing import MinMaxScaler<br />from sklearn.metrics import mean_squared_error<br />import math<br />import matplotlib.pyplot as plt

from pandas import read_csv

import numpy as np

from keras.fashions import Sequential

from keras.layers import Dense, SimpleRNN

from sklearn.preprocessing import MinMaxScaler

from sklearn.metrics import mean_squared_error

import math

import matplotlib.pyplot as plt

Want to Get Started With Building Transformer Models with Attention?

Take my free 12-day email correspondence crash course now (with sample code).

Click to sign-up and likewise get a free PDF Ebook mannequin of the course.

Keras SimpleRNN

The function beneath returns a model that includes a SimpleRNN layer and a Dense layer for finding out sequential info. The input_shape specifies the parameter (time_steps x choices). We’ll simplify each factor and use univariate info, i.e., one operate solely; the time steps are talked about beneath.

def create_RNN(hidden_units, dense_units, input_shape, activation):<br />    model = Sequential()<br />    model.add(SimpleRNN(hidden_units, input_shape=input_shape,<br />                        activation=activation[0]))<br />    model.add(Dense(fashions=dense_units, activation=activation[1]))<br />    model.compile(loss=”mean_squared_error”, optimizer=”adam”)<br />    return model</p><p>demo_model = create_RNN(2, 1, (3,1), activation=[‘linear’, ‘linear’])

def create_RNN(hidden_units, dense_units, input_shape, activation):

model = Sequential()

model.add(SimpleRNN(hidden_units, input_shape=input_shape,

activation=activation[0]))

model.add(Dense(fashions=dense_units, activation=activation[1]))

model.compile(loss=‘mean_squared_error’, optimizer=‘adam’)

return model

demo_model = create_RNN(2, 1, (3,1), activation=[‘linear’, ‘linear’])

The object demo_model is returned with two hidden fashions created via the SimpleRNN layer and one dense unit created via the Dense layer. The input_shape is about at 3×1, and a linear activation function is utilized in every layers for simplicity. Just to recall, the linear activation function $f(x) = x$ makes no change inside the enter. The neighborhood seems to be like as follows:

If now we have now $m$ hidden fashions ($m=2$ inside the above case), then:

Input: $x in R$
Hidden unit: $h in R^m$
Weights for the enter fashions: $w_x in R^m$
Weights for the hidden fashions: $w_h in R^{mxm}$
Bias for the hidden fashions: $b_h in R^m$
Weight for the dense layer: $w_y in R^m$
Bias for the dense layer: $b_y in R$

Let’s take a look on the above weights. Note: As the weights are randomly initialized, the outcomes posted proper right here will most likely be completely completely different from yours. The important issue is to be taught what the development of each object getting used seems to be like like and the best way it interacts with others to supply the last word output.

wx = demo_model.get_weights()[0]<br />wh = demo_model.get_weights()[1]<br />bh = demo_model.get_weights()[2]<br />wy = demo_model.get_weights()[3]<br />by = demo_model.get_weights()[4]</p><p>print(‘wx = ‘, wx, ‘ wh=”, wh, ” bh=”, bh, ” wy =’, wy, ‘by = ‘, by)

wx = demo_model.get_weights()[0]

wh = demo_model.get_weights()[1]

bh = demo_model.get_weights()[2]

wy = demo_model.get_weights()[3]

by = demo_model.get_weights()[4]

print(‘wx = ‘, wx, ‘ wh=”, wh, ” bh=”, bh, ” wy =’, wy, ‘by = ‘, by)

wx =  [[ 0.18662322 -1.2369459 ]]  wh =  [[ 0.86981213 -0.49338293]<br /> [ 0.49338293  0.8698122 ]]  bh =  [0. 0.]  wy = [[-0.4635998]<br /> [ 0.6538409]] by =  [0.]

wx = [[ 0.18662322 -1.2369459 ]] wh = [[ 0.86981213 -0.49338293]

[ 0.49338293 0.8698122 ]] bh = [0. 0.] wy = [[-0.4635998]

[ 0.6538409]] by = [0.]

Now let’s do a simple experiment to see how the layers from a SimpleRNN and Dense layer produce an output. Keep this decide in view.

Layers of a recurrent neural neighborhood

We’ll enter x for 3 time steps and let the neighborhood generate an output. The values of the hidden fashions at time steps 1, 2, and three will most likely be computed. $h_0$ is initialized to the zero vector. The output $o_3$ is computed from $h_3$ and $w_y$. An activation function should not be required as we’re using linear fashions.

x = np.array([1, 2, 3])<br /># Reshape the enter to the required sample_size x time_steps x choices<br />x_input = np.reshape(x,(1, 3, 1))<br />y_pred_model = demo_model.predict(x_input)</p><p>m = 2<br />h0 = np.zeros(m)<br />h1 = np.dot(x[0], wx) + h0 + bh<br />h2 = np.dot(x[1], wx) + np.dot(h1,wh) + bh<br />h3 = np.dot(x[2], wx) + np.dot(h2,wh) + bh<br />o3 = np.dot(h3, wy) + by</p><p>print(‘h1 = ‘, h1,’h2 = ‘, h2,’h3 = ‘, h3)</p><p>print(“Prediction from neighborhood “, y_pred_model)<br />print(“Prediction from our computation “, o3)

x = np.array([1, 2, 3])

# Reshape the enter to the required sample_size x time_steps x choices

x_input = np.reshape(x,(1, 3, 1))

y_pred_model = demo_model.predict(x_input)

m = 2

h0 = np.zeros(m)

h1 = np.dot(x[0], wx) + h0 + bh

h2 = np.dot(x[1], wx) + np.dot(h1,wh) + bh

h3 = np.dot(x[2], wx) + np.dot(h2,wh) + bh

o3 = np.dot(h3, wy) + by

print(‘h1 = ‘, h1,‘h2 = ‘, h2,‘h3 = ‘, h3)

print(“Prediction from neighborhood “, y_pred_model)

print(“Prediction from our computation “, o3)

h1 =  [[ 0.18662322 -1.23694587]] h2 =  [[-0.07471441 -3.64187904]] h3 =  [[-1.30195881 -6.84172557]]<br />Prediction from neighborhood  [[-3.8698118]]<br />Prediction from our computation  [[-3.86981216]]

h1 = [[ 0.18662322 -1.23694587]] h2 = [[-0.07471441 -3.64187904]] h3 = [[-1.30195881 -6.84172557]]

Prediction from neighborhood [[-3.8698118]]

Prediction from our computation [[-3.86981216]]

Running the RNN on Sunspots Dataset

Now that we understand how the SimpleRNN and Dense layers are put collectively. Let’s run a complete RNN on a simple time assortment dataset. We’ll have to watch these steps:

Read the dataset from a given URL
Split the data into teaching and examine models
Prepare the enter to the required Keras format
Create an RNN model and apply it
Make the predictions on teaching and examine models and print the idea indicate sq. error on every models
View the consequence

Step 1, 2: Reading Data and Splitting Into Train and Test

The following function reads the apply and examine info from a given URL and splits it proper right into a given share of apply and examine info. It returns single-dimensional arrays for apply and examine info after scaling the data between 0 and 1 using MinMaxScaler from scikit-learn.

# Parameter split_percent defines the ratio of teaching examples<br />def get_train_test(url, split_percent=0.8):<br />    df = read_csv(url, usecols=[1], engine=”python”)<br />    info = np.array(df.values.astype(‘float32’))<br />    scaler = MinMaxScaler(feature_range=(0, 1))<br />    info = scaler.fit_transform(info).flatten()<br />    n = len(info)<br />    # Point for splitting info into apply and examine<br />    break up = int(n*split_percent)<br />    train_data = info[range(split)]<br />    test_data = info[split:]<br />    return train_data, test_data, info</p><p>sunspots_url=”https://uncooked.githubusercontent.com/jbrownlee/Datasets/grasp/monthly-sunspots.csv”<br />train_data, test_data, info = get_train_test(sunspots_url)

# Parameter split_percent defines the ratio of teaching examples

def get_train_test(url, split_percent=0.8):

df = read_csv(url, usecols=[1], engine=‘python’)

info = np.array(df.values.astype(‘float32’))

scaler = MinMaxScaler(feature_range=(0, 1))

info = scaler.fit_transform(info).flatten()

n = len(info)

# Point for splitting info into apply and examine

break up = int(n*split_percent)

train_data = info[range(split)]

test_data = info[split:]

return train_data, test_data, info

sunspots_url = ‘https://uncooked.githubusercontent.com/jbrownlee/Datasets/grasp/monthly-sunspots.csv’

train_data, test_data, info = get_train_test(sunspots_url)

Step 3: Reshaping Data for Keras

The subsequent step is to rearrange the data for Keras model teaching. The enter array must be shaped as: total_samples x time_steps x choices.

There are some methods of constructing prepared time assortment info for teaching. We’ll create enter rows with non-overlapping time steps. An occasion for time steps = 2 is confirmed inside the decide beneath. Here, time steps denotes the number of earlier time steps to utilize for predicting the next value of the time assortment info.

How Data Is Prepared For Sunspots Example

How info is prepared for sunspots occasion

The following function get_XY() takes a one-dimensional array as enter and converts it to the required enter X and objective Y arrays. We’ll use 12 time_steps for the sunspots dataset as a result of the sunspots often have a cycle of 12 months. You can experiment with completely different values of time_steps.

# Prepare the enter X and objective Y<br />def get_XY(dat, time_steps):<br />    # Indices of objective array<br />    Y_ind = np.arange(time_steps, len(dat), time_steps)<br />    Y = dat[Y_ind]<br />    # Prepare X<br />    rows_x = len(Y)<br />    X = dat[range(time_steps*rows_x)]<br />    X = np.reshape(X, (rows_x, time_steps, 1))<br />    return X, Y</p><p>time_steps = 12<br />trainX, trainY = get_XY(train_data, time_steps)<br />testX, testY = get_XY(test_data, time_steps)

# Prepare the enter X and objective Y

def get_XY(dat, time_steps):

# Indices of objective array

Y_ind = np.arange(time_steps, len(dat), time_steps)

Y = dat[Y_ind]

# Prepare X

rows_x = len(Y)

X = dat[range(time_steps*rows_x)]

X = np.reshape(X, (rows_x, time_steps, 1))

return X, Y

time_steps = 12

trainX, trainY = get_XY(train_data, time_steps)

testX, testY = get_XY(test_data, time_steps)

Step 4: Create RNN Model and Train

For this step, you can reuse your create_RNN() function that was outlined above.

model = create_RNN(hidden_units=3, dense_units=1, input_shape=(time_steps,1),<br />                   activation=[‘tanh’, ‘tanh’])<br />model.match(trainX, trainY, epochs=20, batch_size=1, verbose=2)

model = create_RNN(hidden_units=3, dense_units=1, input_shape=(time_steps,1),

activation=[‘tanh’, ‘tanh’])

model.match(trainX, trainY, epochs=20, batch_size=1, verbose=2)

Step 5: Compute and Print the Root Mean Square Error

The function print_error() computes the indicate sq. error between the exact and predicted values.

def print_error(trainY, testY, train_predict, test_predict):<br />    # Error of predictions<br />    train_rmse = math.sqrt(mean_squared_error(trainY, train_predict))<br />    test_rmse = math.sqrt(mean_squared_error(testY, test_predict))<br />    # Print RMSE<br />    print(‘Train RMSE: %.3f RMSE’ % (train_rmse))<br />    print(‘Test RMSE: %.3f RMSE’ % (test_rmse))    </p><p># make predictions<br />train_predict = model.predict(trainX)<br />test_predict = model.predict(testX)<br /># Mean sq. error<br />print_error(trainY, testY, train_predict, test_predict)

def print_error(trainY, testY, train_predict, test_predict):

# Error of predictions

train_rmse = math.sqrt(mean_squared_error(trainY, train_predict))

test_rmse = math.sqrt(mean_squared_error(testY, test_predict))

# Print RMSE

print(‘Train RMSE: %.3f RMSE’ % (train_rmse))

print(‘Test RMSE: %.3f RMSE’ % (test_rmse))

# make predictions

train_predict = model.predict(trainX)

test_predict = model.predict(testX)

# Mean sq. error

print_error(trainY, testY, train_predict, test_predict)

Train RMSE: 0.058 RMSE<br />Test RMSE: 0.077 RMSE

1 2	Train RMSE: 0.058 RMSE Test RMSE: 0.077 RMSE

Step 6: View the Result

The following function plots the exact objective values and the anticipated values. The purple line separates the teaching and examine info components.

# Plot the consequence<br />def plot_result(trainY, testY, train_predict, test_predict):<br />    exact = np.append(trainY, testY)<br />    predictions = np.append(train_predict, test_predict)<br />    rows = len(exact)<br />    plt.decide(figsize=(15, 6), dpi=80)<br />    plt.plot(differ(rows), exact)<br />    plt.plot(differ(rows), predictions)<br />    plt.axvline(x=len(trainY), color=”r”)<br />    plt.legend([‘Actual’, ‘Predictions’])<br />    plt.xlabel(‘Observation amount after given time steps’)<br />    plt.ylabel(‘Sunspots scaled’)<br />    plt.title(‘Actual and Predicted Values. The Red Line Separates The Training And Test Examples’)<br />plot_result(trainY, testY, train_predict, test_predict)

# Plot the consequence

def plot_result(trainY, testY, train_predict, test_predict):

exact = np.append(trainY, testY)

predictions = np.append(train_predict, test_predict)

rows = len(exact)

plt.decide(figsize=(15, 6), dpi=80)

plt.plot(differ(rows), exact)

plt.plot(differ(rows), predictions)

plt.axvline(x=len(trainY), color=‘r’)

plt.legend([‘Actual’, ‘Predictions’])

plt.xlabel(‘Observation amount after given time steps’)

plt.ylabel(‘Sunspots scaled’)

plt.title(‘Actual and Predicted Values. The Red Line Separates The Training And Test Examples’)

plot_result(trainY, testY, train_predict, test_predict)

The following plot is generated:

Consolidated Code

Given beneath is the whole code for this tutorial. Try this out at your end and experiment with completely completely different hidden fashions and time steps. You can add a second SimpleRNN to the neighborhood and see the best way it behaves. You can also use the scaler object to rescale the data once more to its common differ.

# Parameter split_percent defines the ratio of teaching examples<br />def get_train_test(url, split_percent=0.8):<br />    df = read_csv(url, usecols=[1], engine=”python”)<br />    info = np.array(df.values.astype(‘float32’))<br />    scaler = MinMaxScaler(feature_range=(0, 1))<br />    info = scaler.fit_transform(info).flatten()<br />    n = len(info)<br />    # Point for splitting info into apply and examine<br />    break up = int(n*split_percent)<br />    train_data = info[range(split)]<br />    test_data = info[split:]<br />    return train_data, test_data, info</p><p># Prepare the enter X and objective Y<br />def get_XY(dat, time_steps):<br />    Y_ind = np.arange(time_steps, len(dat), time_steps)<br />    Y = dat[Y_ind]<br />    rows_x = len(Y)<br />    X = dat[range(time_steps*rows_x)]<br />    X = np.reshape(X, (rows_x, time_steps, 1))<br />    return X, Y</p><p>def create_RNN(hidden_units, dense_units, input_shape, activation):<br />    model = Sequential()<br />    model.add(SimpleRNN(hidden_units, input_shape=input_shape, activation=activation[0]))<br />    model.add(Dense(fashions=dense_units, activation=activation[1]))<br />    model.compile(loss=”mean_squared_error”, optimizer=”adam”)<br />    return model</p><p>def print_error(trainY, testY, train_predict, test_predict):<br />    # Error of predictions<br />    train_rmse = math.sqrt(mean_squared_error(trainY, train_predict))<br />    test_rmse = math.sqrt(mean_squared_error(testY, test_predict))<br />    # Print RMSE<br />    print(‘Train RMSE: %.3f RMSE’ % (train_rmse))<br />    print(‘Test RMSE: %.3f RMSE’ % (test_rmse))    </p><p># Plot the consequence<br />def plot_result(trainY, testY, train_predict, test_predict):<br />    exact = np.append(trainY, testY)<br />    predictions = np.append(train_predict, test_predict)<br />    rows = len(exact)<br />    plt.decide(figsize=(15, 6), dpi=80)<br />    plt.plot(differ(rows), exact)<br />    plt.plot(differ(rows), predictions)<br />    plt.axvline(x=len(trainY), color=”r”)<br />    plt.legend([‘Actual’, ‘Predictions’])<br />    plt.xlabel(‘Observation amount after given time steps’)<br />    plt.ylabel(‘Sunspots scaled’)<br />    plt.title(‘Actual and Predicted Values. The Red Line Separates The Training And Test Examples’)</p><p>sunspots_url=”https://uncooked.githubusercontent.com/jbrownlee/Datasets/grasp/monthly-sunspots.csv”<br />time_steps = 12<br />train_data, test_data, info = get_train_test(sunspots_url)<br />trainX, trainY = get_XY(train_data, time_steps)<br />testX, testY = get_XY(test_data, time_steps)</p><p># Create model and apply<br />model = create_RNN(hidden_units=3, dense_units=1, input_shape=(time_steps,1),<br />                   activation=[‘tanh’, ‘tanh’])<br />model.match(trainX, trainY, epochs=20, batch_size=1, verbose=2)</p><p># make predictions<br />train_predict = model.predict(trainX)<br />test_predict = model.predict(testX)</p><p># Print error<br />print_error(trainY, testY, train_predict, test_predict)</p><p>#Plot consequence<br />plot_result(trainY, testY, train_predict, test_predict)