The Transformer Positional Encoding Layer in Keras, Part 2
- Get link
- X
- Other Apps
Last Updated on January 6, 2023
In half 1, a fragile introduction to positional encoding in transformer fashions, we talked about the positional encoding layer of the transformer model. We moreover confirmed how you’ll implement this layer and its capabilities your self in Python. In this tutorial, you’ll implement the positional encoding layer in Keras and Tensorflow. You can then use this layer in an entire transformer model.
After ending this tutorial, you may know:
- Text vectorization in Keras
- Embedding layer in Keras
- How to subclass the embedding layer and write your private positional encoding layer.
Kick-start your enterprise with my book Building Transformer Models with Attention. It provides self-study tutorials with working code to data you into developing a fully-working transformer model which will
translate sentences from one language to a distinct…
Let’s get started.

The transformer positional encoding layer in Keras, half 2
Photo by Ijaz Rafi. Some rights reserved
Tutorial Overview
This tutorial is break up into three parts; they’re:
- Text vectorization and embedding layer in Keras
- Writing your private positional encoding layer in Keras
- Randomly initialized and tunable embeddings
- Fixed weight embeddings from Attention Is All You Need
- Graphical view of the output of the positional encoding layer
The Import Section
First, let’s write the half to import the entire required libraries:
1 2 3 4 5 6 | import tensorflow as tf from tensorflow import convert_to_tensor, string from tensorflow.keras.layers import TextVectorization, Embedding, Layer from tensorflow.data import Dataset import numpy as np import matplotlib.pyplot as plt |
The Text Vectorization Layer
Let’s start with a set of English phrases which could be already preprocessed and cleaned. The textual content material vectorization layer creates a dictionary of phrases and replaces each phrase with its corresponding index throughout the dictionary. Let’s see how one can map these two sentences using the textual content material vectorization layer:
- I’m a robotic
- you too robotic
Note the textual content material has already been remodeled to lowercase with the entire punctuation marks and noise throughout the textual content material eradicated. Next, convert these two phrases to vectors of a set dimension 5. The TextVectorization
layer of Keras requires a most vocabulary dimension and the required dimension of an output sequence for initialization. The output of the layer is a tensor of type:
(number of sentences, output sequence dimension)
The following code snippet makes use of the adapt
method to generate a vocabulary. It subsequent creates a vectorized illustration of the textual content material.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | output_sequence_length = 5 vocab_size = 10 sentences = [[“I am a robot”], [“you too robot”]] sentence_data = Dataset.from_tensor_slices(sentences) # Create the TextVectorization layer vectorize_layer = TextVectorization( output_sequence_length=output_sequence_length, max_tokens=vocab_size) # Train the layer to create a dictionary vectorize_layer.adapt(sentence_data) # Convert all sentences to tensors word_tensors = convert_to_tensor(sentences, dtype=tf.string) # Use the phrase tensors to get vectorized phrases vectorized_words = vectorize_layer(word_tensors) print(“Vocabulary: “, vectorize_layer.get_vocabulary()) print(“Vectorized phrases: “, vectorized_words) |
1 2 3 4 | Vocabulary: [”, ‘[UNK]’, ‘robotic’, ‘you’, ‘too’, ‘i’, ‘am’, ‘a’] Vectorized phrases: tf.Tensor( [[5 6 7 2 0] [3 4 2 0 0]], type=(2, 5), dtype=int64) |
Want to Get Started With Building Transformer Models with Attention?
Take my free 12-day electronic message crash course now (with sample code).
Click to sign-up and likewise get a free PDF Ebook mannequin of the course.
The Embedding Layer
The Keras Embedding
layer converts integers to dense vectors. This layer maps these integers to random numbers, which might be later tuned via the teaching part. However, you even have the selection to set the mapping to some predefined weight values (confirmed later). To initialize this layer, you must specify the utmost price of an integer to map, along with the dimensions of the output sequence.
The Word Embeddings
Let’s see how the layer converts the vectorized_text
to tensors.
1 2 3 4 | output_length = 6 word_embedding_layer = Embedding(vocab_size, output_length) embedded_words = word_embedding_layer(vectorized_words) print(embedded_words) |
The output has been annotated with some suggestions, as confirmed beneath. Note that you’ll discover a definite output every time you run this code because of the weights have been initialized randomly.

Word embeddings. This output will probably be completely completely different every time you run the code as a result of random numbers involved.
The Position Embeddings
You moreover need the embeddings for the corresponding positions. The most positions correspond to the output sequence dimension of the TextVectorization
layer.
P(okay, 2i) &=& sinBig(frac{okay}{n^{2i/d}}Big)
P(okay, 2i+1) &=& cosBig(frac{okay}{n^{2i/d}}Big)
end{eqnarray}
Embedding
layer, you must current the positional encoding matrix as weights along with trainable=False
. Let’s create one different positional embedding class that does exactly this.1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | class PositionEmbeddingFixedWeights(Layer): def __init__(self, sequence_length, vocab_size, output_dim, **kwargs): great(PositionEmbeddingFixedWeights, self).__init__(**kwargs) word_embedding_matrix = self.get_position_encoding(vocab_size, output_dim) position_embedding_matrix = self.get_position_encoding(sequence_length, output_dim) self.word_embedding_layer = Embedding( input_dim=vocab_size, output_dim=output_dim, weights=[word_embedding_matrix], trainable=False ) self.position_embedding_layer = Embedding( input_dim=sequence_length, output_dim=output_dim, weights=[position_embedding_matrix], trainable=False ) def get_position_encoding(self, seq_len, d, n=10000): P = np.zeros((seq_len, d)) for okay in range(seq_len): for i in np.arange(int(d/2)): denominator = np.vitality(n, 2*i/d) P[k, 2*i] = np.sin(okay/denominator) P[k, 2*i+1] = np.cos(okay/denominator) return P def title(self, inputs): position_indices = tf.range(tf.type(inputs)[–1]) embedded_words = self.word_embedding_layer(inputs) embedded_indices = self.position_embedding_layer(position_indices) return embedded_words + embedded_indices |
Next, we prepare each factor to run this layer.
1 2 3 4 | attnisallyouneed_embedding = PositionEmbeddingFixedWeights(output_sequence_length, vocab_size, output_length) attnisallyouneed_output = attnisallyouneed_embedding(vectorized_words) print(“Output from my_embedded_layer: “, attnisallyouneed_output) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | Output from my_embedded_layer: tf.Tensor( [[[-0.9589243 1.2836622 0.23000172 1.9731903 0.01077196 1.9999421 ] [ 0.56205547 1.5004725 0.3213085 1.9603932 0.01508068 1.9999142 ] [ 1.566284 0.3377554 0.41192317 1.9433732 0.01938933 1.999877 ] [ 1.0504174 -1.4061394 0.2314966 1.9860148 0.01077211 1.9999698 ] [-0.7568025 0.3463564 0.18459873 1.982814 0.00861763 1.9999628 ]] [[ 0.14112 0.0100075 0.1387981 1.9903207 0.00646326 1.9999791 ] [ 0.08466846 -0.11334133 0.23099795 1.9817369 0.01077207 1.9999605 ] [ 1.8185948 -0.8322937 0.185397 1.9913884 0.00861771 1.9999814 ] [ 0.14112 0.0100075 0.1387981 1.9903207 0.00646326 1.9999791 ] [-0.7568025 0.3463564 0.18459873 1.982814 0.00861763 1.9999628 ]]], type=(2, 5, 6), dtype=float32) |
Visualizing the Final Embedding
In order to visualise the embeddings, let’s take two bigger sentences: one technical and the alternative one solely a quote. We’ll prepare the TextVectorization
layer along with the positional encoding layer and see what the last word output seems to be like like.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | technical_phrase = “to know machine finding out algorithms you need” + ” to know concepts equal to gradient of a carry out “+ “Hessians of a matrix and optimization and plenty of others” wise_phrase = “patrick henry said give me liberty or give me demise “+ “when he addressed the second virginia convention in march” total_vocabulary = 200 sequence_length = 20 final_output_len = 50 phrase_vectorization_layer = TextVectorization( output_sequence_length=sequence_length, max_tokens=total_vocabulary) # Learn the dictionary phrase_vectorization_layer.adapt([technical_phrase, wise_phrase]) # Convert all sentences to tensors phrase_tensors = convert_to_tensor([technical_phrase, wise_phrase], dtype=tf.string) # Use the phrase tensors to get vectorized phrases vectorized_phrases = phrase_vectorization_layer(phrase_tensors) random_weights_embedding_layer = PositionEmbeddingLayer(sequence_length, total_vocabulary, final_output_len) fixed_weights_embedding_layer = PositionEmbeddingFixedWeights(sequence_length, total_vocabulary, final_output_len) random_embedding = random_weights_embedding_layer(vectorized_phrases) fixed_embedding = fixed_weights_embedding_layer(vectorized_phrases) |
Now let’s see what the random embeddings seem to be for every phrases.
1 2 3 4 5 6 7 8 9 10 | fig = plt.decide(figsize=(15, 5)) title = [“Tech Phrase”, “Wise Phrase”] for i in range(2): ax = plt.subplot(1, 2, 1+i) matrix = tf.reshape(random_embedding[i, :, :], (sequence_length, final_output_len)) cax = ax.matshow(matrix) plt.gcf().colorbar(cax) plt.title(title[i], y=1.2) fig.suptitle(“Random Embedding”) plt.current() |

Random embeddings
The embedding from the mounted weights layer are visualized beneath.
1 2 3 4 5 6 7 8 9 10 | fig = plt.decide(figsize=(15, 5)) title = [“Tech Phrase”, “Wise Phrase”] for i in range(2): ax = plt.subplot(1, 2, 1+i) matrix = tf.reshape(fixed_embedding[i, :, :], (sequence_length, final_output_len)) cax = ax.matshow(matrix) plt.gcf().colorbar(cax) plt.title(title[i], y=1.2) fig.suptitle(“Fixed Weight Embedding from Attention is All You Need”) plt.current() |

Embedding using sinusoidal positional encoding
You can see that the embedding layer initialized using the default parameter outputs random values. On the alternative hand, the mounted weights generated using sinusoids create a novel signature for every phrase with knowledge on each phrase place encoded inside it.
You can experiment with tunable or fixed-weight implementations to your particular utility.
Further Reading
This half provides further sources on the topic in case you might be in search of to go deeper.
Books
- Transformers for natural language processing by Denis Rothman
Papers
Articles
- The Transformer Attention Mechanism
- The Transformer Model
- Transformer Model for Language Understanding
- Using Pre-Trained Word Embeddings in a Keras Model
- English-to-Spanish translation with a sequence-to-sequence Transformer
- A Gentle Introduction to Positional Encoding in Transformer Models, Part 1
Summary
In this tutorial, you discovered the implementation of positional encoding layer in Keras.
Specifically, you found:
- Text vectorization layer in Keras
- Positional encoding layer in Keras
- Creating your private class for positional encoding
- Setting your private weights for the positional encoding layer in Keras
Do you possibly can have any questions on positional encoding talked about on this publish? Ask your questions throughout the suggestions beneath, and I’ll do my best to answer.
Learn Transformers and Attention!
Teach your deep finding out model to be taught a sentence
…using transformer fashions with consideration
Discover how in my new Ebook:
Building Transformer Models with Attention
It provides self-study tutorials with working code to data you into developing a fully-working transformer fashions which will
translate sentences from one language to a distinct…
Give magical vitality of understanding human language for
Your Projects
See What’s Inside
Building Transformer Models with Attention Crash…
A Gentle Introduction to Positional Encoding in…
Implementing the Transformer Encoder from Scratch in…
Implementing the Transformer Decoder from Scratch in…
TensorMotion 2 Tutorial: Get Started in Deep Learning…
Ordinal and One-Hot Encodings for Categorical Data
- Get link
- X
- Other Apps
Comments
Post a Comment