Understanding Simple Recurrent Neural Networks in Keras
- Get link
- X
- Other Apps
Last Updated on January 6, 2023
This tutorial is designed for anyone looking for an understanding of how recurrent neural networks (RNN) work and straightforward strategies to make use of them via the Keras deep finding out library. While the Keras library provides the entire methods required for fixing points and developing features, it is also important to appreciate an notion into how each factor works. In this textual content, the computations happening inside the RNN model are confirmed step-by-step. Next, a complete end-to-end system for time assortment prediction is developed.
After ending this tutorial, you will know:
- The development of an RNN
- How an RNN computes the output when given an enter
- How to rearrange info for a SimpleRNN in Keras
- How to educate a SimpleRNN model
Kick-start your problem with my e ebook Building Transformer Models with Attention. It provides self-study tutorials with working code to info you into developing a fully-working transformer model which will
translate sentences from one language to a unique…
Let’s get started.

Understanding simple recurrent neural networks in Keras. Photo by Mehreen Saeed, some rights reserved.
Tutorial Overview
This tutorial is break up into two elements; they’re:
- The development of the RNN
- Different weights and biases associated to completely completely different layers of the RNN
- How computations are carried out to compute the output when given an enter
- A complete software program for time assortment prediction
Prerequisites
It is assumed that you have a elementary understanding of RNNs sooner than you start implementing them. An Introduction to Recurrent Neural Networks and the Math That Powers Them provides you a quick overview of RNNs.
Let’s now get correct proper all the way down to the implementation half.
Import Section
To start the implementation of RNNs, let’s add the import half.
Python
1 2 3 4 5 6 7 8 | from pandas import read_csv import numpy as np from keras.fashions import Sequential from keras.layers import Dense, SimpleRNN from sklearn.preprocessing import MinMaxScaler from sklearn.metrics import mean_squared_error import math import matplotlib.pyplot as plt |
Want to Get Started With Building Transformer Models with Attention?
Take my free 12-day email correspondence crash course now (with sample code).
Click to sign-up and likewise get a free PDF Ebook mannequin of the course.
Keras SimpleRNN
The function beneath returns a model that includes a SimpleRNN
layer and a Dense
layer for finding out sequential info. The input_shape
specifies the parameter (time_steps x choices)
. We’ll simplify each factor and use univariate info, i.e., one operate solely; the time steps are talked about beneath.
Python
1 2 3 4 5 6 7 8 9 | def create_RNN(hidden_units, dense_units, input_shape, activation): model = Sequential() model.add(SimpleRNN(hidden_units, input_shape=input_shape, activation=activation[0])) model.add(Dense(fashions=dense_units, activation=activation[1])) model.compile(loss=‘mean_squared_error’, optimizer=‘adam’) return model demo_model = create_RNN(2, 1, (3,1), activation=[‘linear’, ‘linear’]) |
The object demo_model
is returned with two hidden fashions created via the SimpleRNN
layer and one dense unit created via the Dense
layer. The input_shape
is about at 3×1, and a linear
activation function is utilized in every layers for simplicity. Just to recall, the linear activation function $f(x) = x$ makes no change inside the enter. The neighborhood seems to be like as follows:
If now we have now $m$ hidden fashions ($m=2$ inside the above case), then:
- Input: $x in R$
- Hidden unit: $h in R^m$
- Weights for the enter fashions: $w_x in R^m$
- Weights for the hidden fashions: $w_h in R^{mxm}$
- Bias for the hidden fashions: $b_h in R^m$
- Weight for the dense layer: $w_y in R^m$
- Bias for the dense layer: $b_y in R$
Let’s take a look on the above weights. Note: As the weights are randomly initialized, the outcomes posted proper right here will most likely be completely completely different from yours. The important issue is to be taught what the development of each object getting used seems to be like like and the best way it interacts with others to supply the last word output.
Python
1 2 3 4 5 6 7 | wx = demo_model.get_weights()[0] wh = demo_model.get_weights()[1] bh = demo_model.get_weights()[2] wy = demo_model.get_weights()[3] by = demo_model.get_weights()[4] print(‘wx = ‘, wx, ‘ wh=”, wh, ” bh=”, bh, ” wy =’, wy, ‘by = ‘, by) |
1 2 3 | wx = [[ 0.18662322 -1.2369459 ]] wh = [[ 0.86981213 -0.49338293] [ 0.49338293 0.8698122 ]] bh = [0. 0.] wy = [[-0.4635998] [ 0.6538409]] by = [0.] |
Now let’s do a simple experiment to see how the layers from a SimpleRNN and Dense layer produce an output. Keep this decide in view.

Layers of a recurrent neural neighborhood
We’ll enter x
for 3 time steps and let the neighborhood generate an output. The values of the hidden fashions at time steps 1, 2, and three will most likely be computed. $h_0$ is initialized to the zero vector. The output $o_3$ is computed from $h_3$ and $w_y$. An activation function should not be required as we’re using linear fashions.
Python
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | x = np.array([1, 2, 3]) # Reshape the enter to the required sample_size x time_steps x choices x_input = np.reshape(x,(1, 3, 1)) y_pred_model = demo_model.predict(x_input) m = 2 h0 = np.zeros(m) h1 = np.dot(x[0], wx) + h0 + bh h2 = np.dot(x[1], wx) + np.dot(h1,wh) + bh h3 = np.dot(x[2], wx) + np.dot(h2,wh) + bh o3 = np.dot(h3, wy) + by print(‘h1 = ‘, h1,‘h2 = ‘, h2,‘h3 = ‘, h3) print(“Prediction from neighborhood “, y_pred_model) print(“Prediction from our computation “, o3) |
1 2 3 | h1 = [[ 0.18662322 -1.23694587]] h2 = [[-0.07471441 -3.64187904]] h3 = [[-1.30195881 -6.84172557]] Prediction from neighborhood [[-3.8698118]] Prediction from our computation [[-3.86981216]] |
Running the RNN on Sunspots Dataset
Now that we understand how the SimpleRNN and Dense layers are put collectively. Let’s run a complete RNN on a simple time assortment dataset. We’ll have to watch these steps:
- Read the dataset from a given URL
- Split the data into teaching and examine models
- Prepare the enter to the required Keras format
- Create an RNN model and apply it
- Make the predictions on teaching and examine models and print the idea indicate sq. error on every models
- View the consequence
Step 1, 2: Reading Data and Splitting Into Train and Test
The following function reads the apply and examine info from a given URL and splits it proper right into a given share of apply and examine info. It returns single-dimensional arrays for apply and examine info after scaling the data between 0 and 1 using MinMaxScaler
from scikit-learn.
Python
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | # Parameter split_percent defines the ratio of teaching examples def get_train_test(url, split_percent=0.8): df = read_csv(url, usecols=[1], engine=‘python’) info = np.array(df.values.astype(‘float32’)) scaler = MinMaxScaler(feature_range=(0, 1)) info = scaler.fit_transform(info).flatten() n = len(info) # Point for splitting info into apply and examine break up = int(n*split_percent) train_data = info[range(split)] test_data = info[split:] return train_data, test_data, info sunspots_url = ‘https://uncooked.githubusercontent.com/jbrownlee/Datasets/grasp/monthly-sunspots.csv’ train_data, test_data, info = get_train_test(sunspots_url) |
Step 3: Reshaping Data for Keras
The subsequent step is to rearrange the data for Keras model teaching. The enter array must be shaped as: total_samples x time_steps x choices
.
There are some methods of constructing prepared time assortment info for teaching. We’ll create enter rows with non-overlapping time steps. An occasion for time steps = 2 is confirmed inside the decide beneath. Here, time steps denotes the number of earlier time steps to utilize for predicting the next value of the time assortment info.

How info is prepared for sunspots occasion
The following function get_XY()
takes a one-dimensional array as enter and converts it to the required enter X
and objective Y
arrays. We’ll use 12 time_steps
for the sunspots dataset as a result of the sunspots often have a cycle of 12 months. You can experiment with completely different values of time_steps
.
Python
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | # Prepare the enter X and objective Y def get_XY(dat, time_steps): # Indices of objective array Y_ind = np.arange(time_steps, len(dat), time_steps) Y = dat[Y_ind] # Prepare X rows_x = len(Y) X = dat[range(time_steps*rows_x)] X = np.reshape(X, (rows_x, time_steps, 1)) return X, Y time_steps = 12 trainX, trainY = get_XY(train_data, time_steps) testX, testY = get_XY(test_data, time_steps) |
Step 4: Create RNN Model and Train
For this step, you can reuse your create_RNN()
function that was outlined above.
Python
1 2 3 | model = create_RNN(hidden_units=3, dense_units=1, input_shape=(time_steps,1), activation=[‘tanh’, ‘tanh’]) model.match(trainX, trainY, epochs=20, batch_size=1, verbose=2) |
Step 5: Compute and Print the Root Mean Square Error
The function print_error()
computes the indicate sq. error between the exact and predicted values.
Python
1 2 3 4 5 6 7 8 9 10 11 12 13 | def print_error(trainY, testY, train_predict, test_predict): # Error of predictions train_rmse = math.sqrt(mean_squared_error(trainY, train_predict)) test_rmse = math.sqrt(mean_squared_error(testY, test_predict)) # Print RMSE print(‘Train RMSE: %.3f RMSE’ % (train_rmse)) print(‘Test RMSE: %.3f RMSE’ % (test_rmse)) # make predictions train_predict = model.predict(trainX) test_predict = model.predict(testX) # Mean sq. error print_error(trainY, testY, train_predict, test_predict) |
1 2 | Train RMSE: 0.058 RMSE Test RMSE: 0.077 RMSE |
Step 6: View the Result
The following function plots the exact objective values and the anticipated values. The purple line separates the teaching and examine info components.
Python
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | # Plot the consequence def plot_result(trainY, testY, train_predict, test_predict): exact = np.append(trainY, testY) predictions = np.append(train_predict, test_predict) rows = len(exact) plt.decide(figsize=(15, 6), dpi=80) plt.plot(differ(rows), exact) plt.plot(differ(rows), predictions) plt.axvline(x=len(trainY), color=‘r’) plt.legend([‘Actual’, ‘Predictions’]) plt.xlabel(‘Observation amount after given time steps’) plt.ylabel(‘Sunspots scaled’) plt.title(‘Actual and Predicted Values. The Red Line Separates The Training And Test Examples’) plot_result(trainY, testY, train_predict, test_predict) |
The following plot is generated:
Consolidated Code
Given beneath is the whole code for this tutorial. Try this out at your end and experiment with completely completely different hidden fashions and time steps. You can add a second SimpleRNN
to the neighborhood and see the best way it behaves. You can also use the scaler
object to rescale the data once more to its common differ.
Python
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 | # Parameter split_percent defines the ratio of teaching examples def get_train_test(url, split_percent=0.8): df = read_csv(url, usecols=[1], engine=‘python’) info = np.array(df.values.astype(‘float32’)) scaler = MinMaxScaler(feature_range=(0, 1)) info = scaler.fit_transform(info).flatten() n = len(info) # Point for splitting info into apply and examine break up = int(n*split_percent) train_data = info[range(split)] test_data = info[split:] return train_data, test_data, info # Prepare the enter X and objective Y def get_XY(dat, time_steps): Y_ind = np.arange(time_steps, len(dat), time_steps) Y = dat[Y_ind] rows_x = len(Y) X = dat[range(time_steps*rows_x)] X = np.reshape(X, (rows_x, time_steps, 1)) return X, Y def create_RNN(hidden_units, dense_units, input_shape, activation): model = Sequential() model.add(SimpleRNN(hidden_units, input_shape=input_shape, activation=activation[0])) model.add(Dense(fashions=dense_units, activation=activation[1])) model.compile(loss=‘mean_squared_error’, optimizer=‘adam’) return model def print_error(trainY, testY, train_predict, test_predict): # Error of predictions train_rmse = math.sqrt(mean_squared_error(trainY, train_predict)) test_rmse = math.sqrt(mean_squared_error(testY, test_predict)) # Print RMSE print(‘Train RMSE: %.3f RMSE’ % (train_rmse)) print(‘Test RMSE: %.3f RMSE’ % (test_rmse)) # Plot the consequence def plot_result(trainY, testY, train_predict, test_predict): exact = np.append(trainY, testY) predictions = np.append(train_predict, test_predict) rows = len(exact) plt.decide(figsize=(15, 6), dpi=80) plt.plot(differ(rows), exact) plt.plot(differ(rows), predictions) plt.axvline(x=len(trainY), color=‘r’) plt.legend([‘Actual’, ‘Predictions’]) plt.xlabel(‘Observation amount after given time steps’) plt.ylabel(‘Sunspots scaled’) plt.title(‘Actual and Predicted Values. The Red Line Separates The Training And Test Examples’) sunspots_url = ‘https://uncooked.githubusercontent.com/jbrownlee/Datasets/grasp/monthly-sunspots.csv’ time_steps = 12 train_data, test_data, info = get_train_test(sunspots_url) trainX, trainY = get_XY(train_data, time_steps) testX, testY = get_XY(test_data, time_steps) # Create model and apply model = create_RNN(hidden_units=3, dense_units=1, input_shape=(time_steps,1), activation=[‘tanh’, ‘tanh’]) model.match(trainX, trainY, epochs=20, batch_size=1, verbose=2) # make predictions train_predict = model.predict(trainX) test_predict = model.predict(testX) # Print error print_error(trainY, testY, train_predict, test_predict) #Plot consequence plot_result(trainY, testY, train_predict, test_predict) |
Further Reading
This half provides additional sources on the topic in case you are attempting to go deeper.
Books
- Deep Learning Essentials by Wei Di, Anurag Bhardwaj, and Jianing Wei.
- Deep Learning by Ian Goodfellow, Joshua Bengio, and Aaron Courville.
Articles
- Wikipedia article on BPTT
- A Tour of Recurrent Neural Network Algorithms for Deep Learning
- A Gentle Introduction to Backpropagation Through Time
- How to Prepare Univariate Time Series Data for Long Short-Term Memory Networks
Summary
In this tutorial, you discovered recurrent neural networks and their quite a few architectures.
Specifically, you found:
- The development of RNNs
- How the RNN computes an output from earlier inputs
- How to implement an end-to-end system for time assortment forecasting using an RNN
Do you could have any questions on RNNs talked about on this put up? Ask your questions inside the suggestions beneath, and I’ll do my best to answer.
Learn Transformers and Attention!
Teach your deep finding out model to study a sentence
…using transformer fashions with consideration
Discover how in my new Ebook:
Building Transformer Models with Attention
It provides self-study tutorials with working code to info you into developing a fully-working transformer fashions which will
translate sentences from one language to a unique…
Give magical vitality of understanding human language for
Your Projects
See What’s Inside
TensorFlow 2 Tutorial: Get Started in Deep Learning…
Mini-Course on Long Short-Term Memory Recurrent…
Crash Course in Recurrent Neural Networks for Deep Learning
Multi-Label Classification of Satellite Photos of…
How to Develop an Encoder-Decoder Model with…
A Tour of Recurrent Neural Network Algorithms for…
- Get link
- X
- Other Apps
Comments
Post a Comment