The Transformer Positional Encoding Layer in Keras, Part 2

Last Updated on January 6, 2023

In half 1, a fragile introduction to positional encoding in transformer fashions, we talked about the positional encoding layer of the transformer model. We moreover confirmed how you’ll implement this layer and its capabilities your self in Python. In this tutorial, you’ll implement the positional encoding layer in Keras and Tensorflow. You can then use this layer in an entire transformer model.

After ending this tutorial, you may know:

Text vectorization in Keras
Embedding layer in Keras
How to subclass the embedding layer and write your private positional encoding layer.

Kick-start your enterprise with my book Building Transformer Models with Attention. It provides self-study tutorials with working code to data you into developing a fully-working transformer model which will
translate sentences from one language to a distinct…

Let’s get started.

The transformer positional encoding layer in Keras, half 2
Photo by Ijaz Rafi. Some rights reserved

Tutorial Overview

This tutorial is break up into three parts; they’re:

Text vectorization and embedding layer in Keras
Writing your private positional encoding layer in Keras
1. Randomly initialized and tunable embeddings
2. Fixed weight embeddings from Attention Is All You Need
Graphical view of the output of the positional encoding layer

The Import Section

First, let’s write the half to import the entire required libraries:

import tensorflow as tf<br />from tensorflow import convert_to_tensor, string<br />from tensorflow.keras.layers import TextVectorization, Embedding, Layer<br />from tensorflow.data import Dataset<br />import numpy as np<br />import matplotlib.pyplot as plt

import tensorflow as tf

from tensorflow import convert_to_tensor, string

from tensorflow.keras.layers import TextVectorization, Embedding, Layer

from tensorflow.data import Dataset

import numpy as np

import matplotlib.pyplot as plt

The Text Vectorization Layer

Let’s start with a set of English phrases which could be already preprocessed and cleaned. The textual content material vectorization layer creates a dictionary of phrases and replaces each phrase with its corresponding index throughout the dictionary. Let’s see how one can map these two sentences using the textual content material vectorization layer:

I’m a robotic
you too robotic

Note the textual content material has already been remodeled to lowercase with the entire punctuation marks and noise throughout the textual content material eradicated. Next, convert these two phrases to vectors of a set dimension 5. The TextVectorization layer of Keras requires a most vocabulary dimension and the required dimension of an output sequence for initialization. The output of the layer is a tensor of type:

(number of sentences, output sequence dimension)

The following code snippet makes use of the adapt method to generate a vocabulary. It subsequent creates a vectorized illustration of the textual content material.

output_sequence_length = 5<br />vocab_size = 10<br />sentences = [[“I am a robot”], [“you too robot”]]<br />sentence_data = Dataset.from_tensor_slices(sentences)<br /># Create the TextVectorization layer<br />vectorize_layer = TextVectorization(<br />                  output_sequence_length=output_sequence_length,<br />                  max_tokens=vocab_size)<br /># Train the layer to create a dictionary<br />vectorize_layer.adapt(sentence_data)<br /># Convert all sentences to tensors<br />word_tensors = convert_to_tensor(sentences, dtype=tf.string)<br /># Use the phrase tensors to get vectorized phrases<br />vectorized_words = vectorize_layer(word_tensors)<br />print(“Vocabulary: “, vectorize_layer.get_vocabulary())<br />print(“Vectorized phrases: “, vectorized_words)

output_sequence_length = 5

vocab_size = 10

sentences = [[“I am a robot”], [“you too robot”]]

sentence_data = Dataset.from_tensor_slices(sentences)

# Create the TextVectorization layer

vectorize_layer = TextVectorization(

output_sequence_length=output_sequence_length,

max_tokens=vocab_size)

# Train the layer to create a dictionary

vectorize_layer.adapt(sentence_data)

# Convert all sentences to tensors

word_tensors = convert_to_tensor(sentences, dtype=tf.string)

# Use the phrase tensors to get vectorized phrases

vectorized_words = vectorize_layer(word_tensors)

print(“Vocabulary: “, vectorize_layer.get_vocabulary())

print(“Vectorized phrases: “, vectorized_words)

Vocabulary:  [”, ‘[UNK]’, ‘robotic’, ‘you’, ‘too’, ‘i’, ‘am’, ‘a’]<br />Vectorized phrases:  tf.Tensor(<br />[[5 6 7 2 0]<br /> [3 4 2 0 0]], type=(2, 5), dtype=int64)

Vocabulary: [”, ‘[UNK]’, ‘robotic’, ‘you’, ‘too’, ‘i’, ‘am’, ‘a’]

Vectorized phrases: tf.Tensor(

[[5 6 7 2 0]

[3 4 2 0 0]], type=(2, 5), dtype=int64)

Want to Get Started With Building Transformer Models with Attention?

Take my free 12-day electronic message crash course now (with sample code).

Click to sign-up and likewise get a free PDF Ebook mannequin of the course.

The Embedding Layer

The Keras Embedding layer converts integers to dense vectors. This layer maps these integers to random numbers, which might be later tuned via the teaching part. However, you even have the selection to set the mapping to some predefined weight values (confirmed later). To initialize this layer, you must specify the utmost price of an integer to map, along with the dimensions of the output sequence.

The Word Embeddings

Let’s see how the layer converts the vectorized_text to tensors.

output_length = 6<br />word_embedding_layer = Embedding(vocab_size, output_length)<br />embedded_words = word_embedding_layer(vectorized_words)<br />print(embedded_words)

output_length = 6

word_embedding_layer = Embedding(vocab_size, output_length)

embedded_words = word_embedding_layer(vectorized_words)

print(embedded_words)

The output has been annotated with some suggestions, as confirmed beneath. Note that you’ll discover a definite output every time you run this code because of the weights have been initialized randomly.

Word embeddings. This output will probably be completely completely different every time you run the code as a result of random numbers involved.

The Position Embeddings

You moreover need the embeddings for the corresponding positions. The most positions correspond to the output sequence dimension of the TextVectorization layer.

begin{eqnarray}
P(okay, 2i) &=& sinBig(frac{okay}{n^{2i/d}}Big)
P(okay, 2i+1) &=& cosBig(frac{okay}{n^{2i/d}}Big)
end{eqnarray}

If you could use the similar positional encoding scheme, you’ll specify your private embedding matrix, as talked about partly 1, which reveals discover ways to create your private embeddings in NumPy. When specifying the Embedding layer, you must current the positional encoding matrix as weights along with trainable=False. Let’s create one different positional embedding class that does exactly this.

class PositionEmbeddingFixedWeights(Layer):<br />    def __init__(self, sequence_length, vocab_size, output_dim, **kwargs):<br />        great(PositionEmbeddingFixedWeights, self).__init__(**kwargs)<br />        word_embedding_matrix = self.get_position_encoding(vocab_size, output_dim)<br />        position_embedding_matrix = self.get_position_encoding(sequence_length, output_dim)<br />        self.word_embedding_layer = Embedding(<br />            input_dim=vocab_size, output_dim=output_dim,<br />            weights=[word_embedding_matrix],<br />            trainable=False<br />        )<br />        self.position_embedding_layer = Embedding(<br />            input_dim=sequence_length, output_dim=output_dim,<br />            weights=[position_embedding_matrix],<br />            trainable=False<br />        )</p><p>    def get_position_encoding(self, seq_len, d, n=10000):<br />        P = np.zeros((seq_len, d))<br />        for okay in range(seq_len):<br />            for i in np.arange(int(d/2)):<br />                denominator = np.vitality(n, 2*i/d)<br />                P[k, 2*i] = np.sin(okay/denominator)<br />                P[k, 2*i+1] = np.cos(okay/denominator)<br />        return P</p><p>    def title(self, inputs):<br />        position_indices = tf.range(tf.type(inputs)[-1])<br />        embedded_words = self.word_embedding_layer(inputs)<br />        embedded_indices = self.position_embedding_layer(position_indices)<br />        return embedded_words + embedded_indices

class PositionEmbeddingFixedWeights(Layer):

def __init__(self, sequence_length, vocab_size, output_dim, **kwargs):

great(PositionEmbeddingFixedWeights, self).__init__(**kwargs)

word_embedding_matrix = self.get_position_encoding(vocab_size, output_dim)

position_embedding_matrix = self.get_position_encoding(sequence_length, output_dim)

self.word_embedding_layer = Embedding(

input_dim=vocab_size, output_dim=output_dim,

weights=[word_embedding_matrix],

trainable=False

)

self.position_embedding_layer = Embedding(

input_dim=sequence_length, output_dim=output_dim,

weights=[position_embedding_matrix],

trainable=False

)

def get_position_encoding(self, seq_len, d, n=10000):

P = np.zeros((seq_len, d))

for okay in range(seq_len):

for i in np.arange(int(d/2)):

denominator = np.vitality(n, 2*i/d)

P[k, 2*i] = np.sin(okay/denominator)

P[k, 2*i+1] = np.cos(okay/denominator)

return P

def title(self, inputs):

position_indices = tf.range(tf.type(inputs)[–1])

embedded_words = self.word_embedding_layer(inputs)

embedded_indices = self.position_embedding_layer(position_indices)

return embedded_words + embedded_indices

Next, we prepare each factor to run this layer.

attnisallyouneed_embedding = PositionEmbeddingFixedWeights(output_sequence_length,<br />                                            vocab_size, output_length)<br />attnisallyouneed_output = attnisallyouneed_embedding(vectorized_words)<br />print(“Output from my_embedded_layer: “, attnisallyouneed_output)

attnisallyouneed_embedding = PositionEmbeddingFixedWeights(output_sequence_length,

vocab_size, output_length)

attnisallyouneed_output = attnisallyouneed_embedding(vectorized_words)

print(“Output from my_embedded_layer: “, attnisallyouneed_output)

Output from my_embedded_layer:  tf.Tensor(<br />[[[-0.9589243   1.2836622   0.23000172  1.9731903   0.01077196<br />    1.9999421 ]<br />  [ 0.56205547  1.5004725   0.3213085   1.9603932   0.01508068<br />    1.9999142 ]<br />  [ 1.566284    0.3377554   0.41192317  1.9433732   0.01938933<br />    1.999877  ]<br />  [ 1.0504174  -1.4061394   0.2314966   1.9860148   0.01077211<br />    1.9999698 ]<br />  [-0.7568025   0.3463564   0.18459873  1.982814    0.00861763<br />    1.9999628 ]]</p><p> [[ 0.14112     0.0100075   0.1387981   1.9903207   0.00646326<br />    1.9999791 ]<br />  [ 0.08466846 -0.11334133  0.23099795  1.9817369   0.01077207<br />    1.9999605 ]<br />  [ 1.8185948  -0.8322937   0.185397    1.9913884   0.00861771<br />    1.9999814 ]<br />  [ 0.14112     0.0100075   0.1387981   1.9903207   0.00646326<br />    1.9999791 ]<br />  [-0.7568025   0.3463564   0.18459873  1.982814    0.00861763<br />    1.9999628 ]]], type=(2, 5, 6), dtype=float32)

Output from my_embedded_layer: tf.Tensor(

[[[-0.9589243 1.2836622 0.23000172 1.9731903 0.01077196

1.9999421 ]

[ 0.56205547 1.5004725 0.3213085 1.9603932 0.01508068

1.9999142 ]

[ 1.566284 0.3377554 0.41192317 1.9433732 0.01938933

1.999877 ]

[ 1.0504174 -1.4061394 0.2314966 1.9860148 0.01077211

1.9999698 ]

[-0.7568025 0.3463564 0.18459873 1.982814 0.00861763

1.9999628 ]]

[[ 0.14112 0.0100075 0.1387981 1.9903207 0.00646326

1.9999791 ]

[ 0.08466846 -0.11334133 0.23099795 1.9817369 0.01077207

1.9999605 ]

[ 1.8185948 -0.8322937 0.185397 1.9913884 0.00861771

1.9999814 ]

[ 0.14112 0.0100075 0.1387981 1.9903207 0.00646326

1.9999791 ]

[-0.7568025 0.3463564 0.18459873 1.982814 0.00861763

1.9999628 ]]], type=(2, 5, 6), dtype=float32)

Visualizing the Final Embedding

In order to visualise the embeddings, let’s take two bigger sentences: one technical and the alternative one solely a quote. We’ll prepare the TextVectorization layer along with the positional encoding layer and see what the last word output seems to be like like.

technical_phrase = “to know machine finding out algorithms you need” +<br />                   ” to know concepts equal to gradient of a carry out “+<br />                   “Hessians of a matrix and optimization and plenty of others”<br />wise_phrase = “patrick henry said give me liberty or give me demise “+<br />              “when he addressed the second virginia convention in march”</p><p>total_vocabulary = 200<br />sequence_length = 20<br />final_output_len = 50<br />phrase_vectorization_layer = TextVectorization(<br />                  output_sequence_length=sequence_length,<br />                  max_tokens=total_vocabulary)<br /># Learn the dictionary<br />phrase_vectorization_layer.adapt([technical_phrase, wise_phrase])<br /># Convert all sentences to tensors<br />phrase_tensors = convert_to_tensor([technical_phrase, wise_phrase],<br />                                   dtype=tf.string)<br /># Use the phrase tensors to get vectorized phrases<br />vectorized_phrases = phrase_vectorization_layer(phrase_tensors)</p><p>random_weights_embedding_layer = PositionEmbeddingLayer(sequence_length,<br />                                                        total_vocabulary,<br />                                                        final_output_len)<br />fixed_weights_embedding_layer = PositionEmbeddingFixedWeights(sequence_length,<br />                                                        total_vocabulary,<br />                                                        final_output_len)<br />random_embedding = random_weights_embedding_layer(vectorized_phrases)<br />fixed_embedding = fixed_weights_embedding_layer(vectorized_phrases)

technical_phrase = “to know machine finding out algorithms you need” +

” to know concepts equal to gradient of a carry out “+

“Hessians of a matrix and optimization and plenty of others”

wise_phrase = “patrick henry said give me liberty or give me demise “+

“when he addressed the second virginia convention in march”

total_vocabulary = 200

sequence_length = 20

final_output_len = 50

phrase_vectorization_layer = TextVectorization(

output_sequence_length=sequence_length,

max_tokens=total_vocabulary)

# Learn the dictionary

phrase_vectorization_layer.adapt([technical_phrase, wise_phrase])

# Convert all sentences to tensors

phrase_tensors = convert_to_tensor([technical_phrase, wise_phrase],

dtype=tf.string)

# Use the phrase tensors to get vectorized phrases

vectorized_phrases = phrase_vectorization_layer(phrase_tensors)

random_weights_embedding_layer = PositionEmbeddingLayer(sequence_length,

total_vocabulary,

final_output_len)

fixed_weights_embedding_layer = PositionEmbeddingFixedWeights(sequence_length,

total_vocabulary,

final_output_len)

random_embedding = random_weights_embedding_layer(vectorized_phrases)

fixed_embedding = fixed_weights_embedding_layer(vectorized_phrases)

Now let’s see what the random embeddings seem to be for every phrases.

fig = plt.decide(figsize=(15, 5))<br />title = [“Tech Phrase”, “Wise Phrase”]<br />for i in range(2):<br />    ax = plt.subplot(1, 2, 1+i)<br />    matrix = tf.reshape(random_embedding[i, :, :], (sequence_length, final_output_len))<br />    cax = ax.matshow(matrix)<br />    plt.gcf().colorbar(cax)<br />    plt.title(title[i], y=1.2)<br />fig.suptitle(“Random Embedding”)<br />plt.current()

fig = plt.decide(figsize=(15, 5))

title = [“Tech Phrase”, “Wise Phrase”]

for i in range(2):

ax = plt.subplot(1, 2, 1+i)

matrix = tf.reshape(random_embedding[i, :, :], (sequence_length, final_output_len))

cax = ax.matshow(matrix)

plt.gcf().colorbar(cax)

plt.title(title[i], y=1.2)

fig.suptitle(“Random Embedding”)

plt.current()

Random embeddings

The embedding from the mounted weights layer are visualized beneath.

fig = plt.decide(figsize=(15, 5))<br />title = [“Tech Phrase”, “Wise Phrase”]<br />for i in range(2):<br />    ax = plt.subplot(1, 2, 1+i)<br />    matrix = tf.reshape(fixed_embedding[i, :, :], (sequence_length, final_output_len))<br />    cax = ax.matshow(matrix)<br />    plt.gcf().colorbar(cax)<br />    plt.title(title[i], y=1.2)<br />fig.suptitle(“Fixed Weight Embedding from Attention is All You Need”)<br />plt.current()

fig = plt.decide(figsize=(15, 5))

title = [“Tech Phrase”, “Wise Phrase”]

for i in range(2):

ax = plt.subplot(1, 2, 1+i)

matrix = tf.reshape(fixed_embedding[i, :, :], (sequence_length, final_output_len))

cax = ax.matshow(matrix)

plt.gcf().colorbar(cax)

plt.title(title[i], y=1.2)

fig.suptitle(“Fixed Weight Embedding from Attention is All You Need”)

plt.current()

Embedding using sinusoidal positional encoding

You can see that the embedding layer initialized using the default parameter outputs random values. On the alternative hand, the mounted weights generated using sinusoids create a novel signature for every phrase with knowledge on each phrase place encoded inside it.

You can experiment with tunable or fixed-weight implementations to your particular utility.

Summary

In this tutorial, you discovered the implementation of positional encoding layer in Keras.

Specifically, you found:

Text vectorization layer in Keras
Positional encoding layer in Keras
Creating your private class for positional encoding
Setting your private weights for the positional encoding layer in Keras

Do you possibly can have any questions on positional encoding talked about on this publish? Ask your questions throughout the suggestions beneath, and I’ll do my best to answer.

Search This Blog

Solution Desk

Why Does My Snapchat AI Have a Story? Has Snapchat AI Been Hacked?