Gradient Descent With Nesterov Momentum From Scratch

Last Updated on October 12, 2023

Gradient descent is an optimization algorithm that follows the detrimental gradient of an aim carry out in order to seek out the minimal of the carry out.

A limitation of gradient descent is that it might probably get caught in flat areas or bounce spherical if the goal carry out returns noisy gradients. Momentum is an technique that accelerates the progress of the search to skim all through flat areas and straightforward out bouncy gradients.

In some circumstances, the acceleration of momentum may trigger the search to miss or overshoot the minima on the bottom of basins or valleys. Nesterov momentum is an extension of momentum that features calculating the decaying shifting widespread of the gradients of projected positions throughout the search home considerably than the exact positions themselves.

This has the influence of harnessing the accelerating benefits of momentum whereas allowing the search to decelerate when approaching the optima and cut back the likelihood of missing or overshooting it.

In this tutorial, you may uncover the way in which to develop the Gradient Descent optimization algorithm with Nesterov Momentum from scratch.

After ending this tutorial, you may know:

Gradient descent is an optimization algorithm that makes use of the gradient of the goal carry out to navigate the search home.
The convergence of gradient descent optimization algorithm may be accelerated by extending the algorithm and together with Nesterov Momentum.
How to implement the Nesterov Momentum optimization algorithm from scratch and apply it to an aim carry out and take into account the outcomes.

Kick-start your enterprise with my new e-book Optimization for Machine Learning, along with step-by-step tutorials and the Python provide code recordsdata for all examples.

Let’s get started.

Gradient Descent With Nesterov Momentum From Scratch
Photo by Bonnie Moreland, some rights reserved.

Tutorial Overview

This tutorial is break up into three parts; they’re:

Gradient Descent
Nesterov Momentum
Gradient Descent With Nesterov Momentum
1. Two-Dimensional Test Problem
2. Gradient Descent Optimization With Nesterov Momentum
3. Visualization of Nesterov Momentum

Gradient Descent

Gradient descent is an optimization algorithm.

It is technically often called a first-order optimization algorithm as a result of it explicitly makes use of the first order spinoff of the aim aim carry out.

First-order methods rely upon gradient information to help direct the search for a minimal …

— Page 69, Algorithms for Optimization, 2023.

The first order spinoff, or simply the “derivative,” is the velocity of change or slope of the aim carry out at a particular stage, e.g. for a particular enter.

If the aim carry out takes a variety of enter variables, it is often called a multivariate carry out and the enter variables may be thought of a vector. In flip, the spinoff of a multivariate aim carry out may also be taken as a vector and is referred to normally as a result of the “gradient.”

Gradient: First order spinoff for a multivariate aim carry out.

The spinoff or the gradient elements throughout the course of the steepest ascent of the aim carry out for a particular enter.

Gradient descent refers to a minimization optimization algorithm that follows the detrimental of the gradient downhill of the aim carry out to seek out the minimal of the carry out.

The gradient descent algorithm requires a aim carry out that is being optimized and the spinoff carry out for the goal carry out. The aim carry out f() returns a ranking for a given set of inputs, and the spinoff carry out f'() affords the spinoff of the aim carry out for a given set of inputs.

The gradient descent algorithm requires a starting point (x) within the situation, equal to a randomly chosen stage throughout the enter home.

The spinoff is then calculated and a step is taken throughout the enter home that is anticipated to result in a downhill movement throughout the aim carry out, assuming we’re minimizing the aim carry out.

A downhill movement is made by first calculating how far to maneuver throughout the enter home, calculated as a result of the steps dimension (often called alpha or the tutorial payment) multiplied by the gradient. This is then subtracted from the current stage, making sure we switch in opposition to the gradient, or down the aim carry out.

x(t+1) = x(t) – step_size * f'(x(t))

The steeper the goal carry out at a given stage, the larger the magnitude of the gradient, and in flip, the larger the step taken throughout the search home. The dimension of the step taken is scaled using a step dimension hyperparameter.

Step Size (alpha): Hyperparameter that controls how far to maneuver throughout the search home in opposition to the gradient each iteration of the algorithm.

If the step dimension is simply too small, the movement throughout the search home shall be small, and the search will take a really very long time. If the step dimension is simply too large, the search would possibly bounce throughout the search home and skip over the optima.

Now that we’re conscious of the gradient descent optimization algorithm, let’s take a look at the Nesterov momentum.

Want to Get Started With Optimization Algorithms?

Take my free 7-day e mail crash course now (with sample code).

Click to sign-up and likewise get a free PDF Ebook mannequin of the course.

Nesterov Momentum

Nesterov Momentum is an extension to the gradient descent optimization algorithm.

The technique was described by (and named for) Yurii Nesterov in his 1983 paper titled “A Method For Solving The Convex Programming Problem With Convergence Rate O(1/k^2).”

Ilya Sutskever, et al. are accountable for popularizing the making use of of Nesterov Momentum throughout the teaching of neural networks with stochastic gradient descent described of their 2013 paper “On The Importance Of Initialization And Momentum In Deep Learning.” They referred to the technique as “Nesterov’s Accelerated Gradient,” or NAG for transient.

Nesterov Momentum is relatively like additional typical momentum apart from the exchange is carried out using the partial spinoff of the projected exchange considerably than the spinoff current variable value.

While NAG simply is not typically thought of a form of momentum, it actually appears to be rigorously related to classical momentum, differing solely throughout the precise exchange of the speed vector …

— On The Importance Of Initialization And Momentum In Deep Learning, 2013.

Traditional momentum consists of sustaining an extra variable that represents the ultimate exchange carried out to the variable, an exponentially decaying shifting widespread of earlier gradients.

The momentum algorithm accumulates an exponentially decaying shifting widespread of earlier gradients and continues to maneuver of their course.

— Page 296, Deep Learning, 2023.

This closing exchange or closing change to the variable is then added to the variable scaled by a “momentum” hyperparameter that controls how numerous the ultimate change in order so as to add, e.g. 0.9 for 90%.

It is easier to think about this exchange by the use of two steps, e.g calculate the change throughout the variable using the partial spinoff then calculate the model new value for the variable.

change(t+1) = (momentum * change(t)) – (step_size * f'(x(t)))
x(t+1) = x(t) + change(t+1)

We can take into account momentum by the use of a ball rolling downhill that may velocity up and proceed to go within the equivalent course even throughout the presence of small hills.

Momentum may be interpreted as a ball rolling down an virtually horizontal incline. The ball naturally gathers momentum as gravity causes it to hurry up, just because the gradient causes momentum to construct up on this descent methodology.

— Page 75, Algorithms for Optimization, 2023.

An situation with momentum is that acceleration can typically set off the search to overshoot the minima on the bottom of a basin or valley floor.

Nesterov Momentum may be thought of a modification to momentum to beat this draw back of overshooting the minima.

It consists of first calculating the projected place of the variable using the change from the ultimate iteration and using the spinoff of the projected place throughout the calculation of the model new place for the variable.

Calculating the gradient of the projected place acts like a correction situation for the acceleration that has been amassed.

With Nesterov momentum the gradient is evaluated after the current velocity is utilized. Thus one can interpret Nesterov momentum as attempting in order so as to add a correction situation to the same old methodology of momentum.

— Page 300, Deep Learning, 2023.

Nesterov Momentum is simple to think about this by the use of the 4 steps:

1. Project the place of the reply.
2. Calculate the gradient of the projection.
3. Calculate the change throughout the variable using the partial spinoff.
4. Update the variable.

Let’s endure these steps in further ingredient.

First, the projected place of your full reply is calculated using the change calculated throughout the closing iteration of the algorithm.

projection(t+1) = x(t) + (momentum * change(t))

We can then calculate the gradient for this new place.

gradient(t+1) = f'(projection(t+1))

Now we’ll calculate the model new place of each variable using the gradient of the projection, first by calculating the change in each variable.

change(t+1) = (momentum * change(t)) – (step_size * gradient(t+1))

And lastly, calculating the model new value for each variable using the calculated change.

x(t+1) = x(t) + change(t+1)

In the sphere of convex optimization additional normally, Nesterov Momentum is known to boost the velocity of convergence of the optimization algorithm (e.g. cut back the number of iterations required to look out the reply).

Like momentum, NAG is a first-order optimization methodology with increased convergence payment guarantee than gradient descent in certain situations.

— On The Importance Of Initialization And Momentum In Deep Learning, 2013.

Although the strategy is environment friendly in teaching neural networks, it won’t have the equivalent primary influence of accelerating convergence.

Unfortunately, throughout the stochastic gradient case, Nesterov momentum would not improve the velocity of convergence.

— Page 300, Deep Learning, 2023.

Now that we’re conscious of the Nesterov Momentum algorithm, let’s uncover how we would implement it and take into account its effectivity.

Gradient Descent With Nesterov Momentum

In this half, we’re going to uncover the way in which to implement the gradient descent optimization algorithm with Nesterov Momentum.

Two-Dimensional Test Problem

First, let’s define an optimization carry out.

We will use a straightforward two-dimensional carry out that squares the enter of each dimension and description the fluctuate of legit inputs from -1.0 to 1.0.

The aim() carry out beneath implements this carry out.

# aim carry out<br />def aim(x, y):<br />	return x**2.0 + y**2.0

# aim carry out

def aim(x, y):

return x**2.0 + y**2.0

We can create a three-dimensional plot of the dataset to get a way for the curvature of the response flooring.

The full occasion of plotting the goal carry out is listed beneath.

# 3d plot of the check out carry out<br />from numpy import arange<br />from numpy import meshgrid<br />from matplotlib import pyplot</p><p># aim carry out<br />def aim(x, y):<br />	return x**2.0 + y**2.0</p><p># define fluctuate for enter<br />r_min, r_max = -1.0, 1.0<br /># sample enter fluctuate uniformly at 0.1 increments<br />xaxis = arange(r_min, r_max, 0.1)<br />yaxis = arange(r_min, r_max, 0.1)<br /># create a mesh from the axis<br />x, y = meshgrid(xaxis, yaxis)<br /># compute targets<br />outcomes = aim(x, y)<br /># create a flooring plot with the jet coloration scheme<br />decide = pyplot.decide()<br />axis = decide.gca(projection=’3d’)<br />axis.plot_surface(x, y, outcomes, cmap=’jet’)<br /># current the plot<br />pyplot.current()

# 3d plot of the check out carry out

from numpy import arange

from numpy import meshgrid

from matplotlib import pyplot

# aim carry out

def aim(x, y):

return x**2.0 + y**2.0

# define fluctuate for enter

r_min, r_max = –1.0, 1.0

# sample enter fluctuate uniformly at 0.1 increments

xaxis = arange(r_min, r_max, 0.1)

yaxis = arange(r_min, r_max, 0.1)

# create a mesh from the axis

x, y = meshgrid(xaxis, yaxis)

# compute targets

outcomes = aim(x, y)

# create a flooring plot with the jet coloration scheme

decide = pyplot.decide()

axis = decide.gca(projection=‘3d’)

axis.plot_surface(x, y, outcomes, cmap=‘jet’)

# current the plot

pyplot.current()

Running the occasion creates a three-dimensional flooring plot of the goal carry out.

We can see the acquainted bowl type with the worldwide minima at f(0, 0) = 0.

Three-Dimensional Plot of the Test Objective Function

We could create a two-dimensional plot of the carry out. This shall be helpful later after we want to plot the progress of the search.

The occasion beneath creates a contour plot of the goal carry out.

# contour plot of the check out carry out<br />from numpy import asarray<br />from numpy import arange<br />from numpy import meshgrid<br />from matplotlib import pyplot</p><p># aim carry out<br />def aim(x, y):<br />	return x**2.0 + y**2.0</p><p># define fluctuate for enter<br />bounds = asarray([[-1.0, 1.0], [-1.0, 1.0]])<br /># sample enter fluctuate uniformly at 0.1 increments<br />xaxis = arange(bounds[0,0], bounds[0,1], 0.1)<br />yaxis = arange(bounds[1,0], bounds[1,1], 0.1)<br /># create a mesh from the axis<br />x, y = meshgrid(xaxis, yaxis)<br /># compute targets<br />outcomes = aim(x, y)<br /># create a stuffed contour plot with 50 ranges and jet coloration scheme<br />pyplot.contourf(x, y, outcomes, ranges=50, cmap=’jet’)<br /># current the plot<br />pyplot.current()

# contour plot of the check out carry out

from numpy import asarray

from numpy import arange

from numpy import meshgrid

from matplotlib import pyplot

# aim carry out

def aim(x, y):

return x**2.0 + y**2.0

# define fluctuate for enter

bounds = asarray([[–1.0, 1.0], [–1.0, 1.0]])

# sample enter fluctuate uniformly at 0.1 increments

xaxis = arange(bounds[0,0], bounds[0,1], 0.1)

yaxis = arange(bounds[1,0], bounds[1,1], 0.1)

# create a mesh from the axis

x, y = meshgrid(xaxis, yaxis)

# compute targets

outcomes = aim(x, y)

# create a stuffed contour plot with 50 ranges and jet coloration scheme

pyplot.contourf(x, y, outcomes, ranges=50, cmap=‘jet’)

# current the plot

pyplot.current()

Running the occasion creates a two-dimensional contour plot of the goal carry out.

We can see the bowl type compressed to contours confirmed with a coloration gradient. We will use this plot to plot the exact elements explored all through the progress of the search.

Two-Dimensional Contour Plot of the Test Objective Function

Now that now we have now a check out aim carry out, let’s check out how we would implement the Nesterov Momentum optimization algorithm.

Gradient Descent Optimization With Nesterov Momentum

We can apply the gradient descent with Nesterov Momentum to the check out draw back.

First, we would like a carry out that calculates the spinoff for this carry out.

The spinoff of x^2 is x * 2 in each dimension and the spinoff() carry out implements this beneath.

Next, we’ll implement gradient descent optimization.

First, we’ll select a random stage throughout the bounds of the problem as a starting point for the search.

This assumes now we have now an array that defines the bounds of the search with one row for each dimension and the first column defines the minimal and the second column defines the utmost of the dimension.

…<br /># generate an preliminary stage<br />reply = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] – bounds[:, 0])

...

# generate an preliminary stage

reply = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] – bounds[:, 0])

Next, now we have to calculate the projected stage from the current place and calculate its spinoff.

…<br /># calculate the projected reply<br />projected = [solution[i] + momentum * change[i] for i in fluctuate(reply.type[0])]<br /># calculate the gradient for the projection<br />gradient = spinoff(projected[0], projected[1])

...

# calculate the projected reply

projected = [solution[i] + momentum * change[i] for i in fluctuate(reply.type[0])]

# calculate the gradient for the projection

gradient = spinoff(projected[0], projected[1])

We can then create the model new reply, one variable at a time.

First, the change throughout the variable is calculated using the partial spinoff and finding out payment with the momentum from the ultimate change throughout the variable. This change is saved for the next iteration of the algorithm. Then the change is used to calculate the model new value for the variable.

…<br /># assemble a solution one variable at a time<br />new_solution = itemizing()<br />for i in fluctuate(reply.type[0]):<br />	# calculate the change<br />	change[i] = (momentum * change[i]) – step_size * gradient[i]<br />	# calculate the model new place on this variable<br />	value = reply[i] + change[i]<br />	# retailer this variable<br />	new_solution.append(value)

...

# assemble a solution one variable at a time

new_solution = itemizing()

for i in fluctuate(reply.type[0]):

# calculate the change

change[i] = (momentum * change[i]) – step_size * gradient[i]

# calculate the model new place on this variable

value = reply[i] + change[i]

# retailer this variable

new_solution.append(value)

This is repeated for each variable for the goal carry out, then repeated for each iteration of the algorithm.

This new reply can then be evaluated using the aim() carry out and the effectivity of the search may be reported.

…<br /># take into account candidate stage<br />reply = asarray(new_solution)<br />solution_eval = aim(reply[0], reply[1])<br /># report progress<br />print(‘>%d f(%s) = %.5f’ % (it, reply, solution_eval))

...

# take into account candidate stage

reply = asarray(new_solution)

solution_eval = aim(reply[0], reply[1])

# report progress

print(‘>%d f(%s) = %.5f’ % (it, reply, solution_eval))

And that’s it.

We can tie all of this collectively proper right into a carry out named nesterov() that takes the names of the goal carry out and the spinoff carry out, an array with the bounds of the world and hyperparameter values for the entire number of algorithm iterations, the tutorial payment, and the momentum, and returns the last word reply and its evaluation.

This full carry out is listed beneath.

# gradient descent algorithm with nesterov momentum<br />def nesterov(aim, spinoff, bounds, n_iter, step_size, momentum):<br />	# generate an preliminary stage<br />	reply = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] – bounds[:, 0])<br />	# itemizing of modifications made to each variable<br />	change = [0.0 for _ in range(bounds.shape[0])]<br />	# run the gradient descent<br />	for it in fluctuate(n_iter):<br />		# calculate the projected reply<br />		projected = [solution[i] + momentum * change[i] for i in fluctuate(reply.type[0])]<br />		# calculate the gradient for the projection<br />		gradient = spinoff(projected[0], projected[1])<br />		# assemble a solution one variable at a time<br />		new_solution = itemizing()<br />		for i in fluctuate(reply.type[0]):<br />			# calculate the change<br />			change[i] = (momentum * change[i]) – step_size * gradient[i]<br />			# calculate the model new place on this variable<br />			value = reply[i] + change[i]<br />			# retailer this variable<br />			new_solution.append(value)<br />		# take into account candidate stage<br />		reply = asarray(new_solution)<br />		solution_eval = aim(reply[0], reply[1])<br />		# report progress<br />		print(‘>%d f(%s) = %.5f’ % (it, reply, solution_eval))<br />	return [solution, solution_eval]

# gradient descent algorithm with nesterov momentum

def nesterov(aim, spinoff, bounds, n_iter, step_size, momentum):

# generate an preliminary stage

reply = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] – bounds[:, 0])

# itemizing of modifications made to each variable

change = [0.0 for _ in range(bounds.shape[0])]

# run the gradient descent

for it in fluctuate(n_iter):

# calculate the projected reply

projected = [solution[i] + momentum * change[i] for i in fluctuate(reply.type[0])]

# calculate the gradient for the projection

gradient = spinoff(projected[0], projected[1])

# assemble a solution one variable at a time

new_solution = itemizing()

for i in fluctuate(reply.type[0]):

# calculate the change

change[i] = (momentum * change[i]) – step_size * gradient[i]

# calculate the model new place on this variable

value = reply[i] + change[i]

# retailer this variable

new_solution.append(value)

# take into account candidate stage

reply = asarray(new_solution)

solution_eval = aim(reply[0], reply[1])

# report progress

print(‘>%d f(%s) = %.5f’ % (it, reply, solution_eval))

return [solution, solution_eval]

Note, now we have now intentionally used lists and essential coding trend instead of vectorized operations for readability. Feel free to adapt the implementation to a vectorization implementation with NumPy arrays for increased effectivity.

We can then define our hyperparameters and identify the nesterov() carry out to optimize our check out aim carry out.

In this case, we’re going to use 30 iterations of the algorithm with a finding out payment of 0.1 and momentum of 0.3. These hyperparameter values had been found after a little bit of trial and error.

…<br /># seed the pseudo random amount generator<br />seed(1)<br /># define fluctuate for enter<br />bounds = asarray([[-1.0, 1.0], [-1.0, 1.0]])<br /># define the entire iterations<br />n_iter = 30<br /># define the step dimension<br />step_size = 0.1<br /># define momentum<br />momentum = 0.3<br /># perform the gradient descent search with nesterov momentum<br />biggest, ranking = nesterov(aim, spinoff, bounds, n_iter, step_size, momentum)<br />print(‘Done!’)<br />print(‘f(%s) = %f’ % (biggest, ranking))

...

# seed the pseudo random amount generator

seed(1)

# define fluctuate for enter

bounds = asarray([[–1.0, 1.0], [–1.0, 1.0]])

# define the entire iterations

n_iter = 30

# define the step dimension

step_size = 0.1

# define momentum

momentum = 0.3

# perform the gradient descent search with nesterov momentum

biggest, ranking = nesterov(aim, spinoff, bounds, n_iter, step_size, momentum)

print(‘Done!’)

print(‘f(%s) = %f’ % (biggest, ranking))

Tying all of this collectively, the entire occasion of gradient descent optimization with Nesterov Momentum is listed beneath.

# gradient descent optimization with nesterov momentum for a two-dimensional check out carry out<br />from math import sqrt<br />from numpy import asarray<br />from numpy.random import rand<br />from numpy.random import seed</p><p># aim carry out<br />def aim(x, y):<br />	return x**2.0 + y**2.0</p><p># spinoff of aim carry out<br />def spinoff(x, y):<br />	return asarray([x * 2.0, y * 2.0])</p><p># gradient descent algorithm with nesterov momentum<br />def nesterov(aim, spinoff, bounds, n_iter, step_size, momentum):<br />	# generate an preliminary stage<br />	reply = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] – bounds[:, 0])<br />	# itemizing of modifications made to each variable<br />	change = [0.0 for _ in range(bounds.shape[0])]<br />	# run the gradient descent<br />	for it in fluctuate(n_iter):<br />		# calculate the projected reply<br />		projected = [solution[i] + momentum * change[i] for i in fluctuate(reply.type[0])]<br />		# calculate the gradient for the projection<br />		gradient = spinoff(projected[0], projected[1])<br />		# assemble a solution one variable at a time<br />		new_solution = itemizing()<br />		for i in fluctuate(reply.type[0]):<br />			# calculate the change<br />			change[i] = (momentum * change[i]) – step_size * gradient[i]<br />			# calculate the model new place on this variable<br />			value = reply[i] + change[i]<br />			# retailer this variable<br />			new_solution.append(value)<br />		# take into account candidate stage<br />		reply = asarray(new_solution)<br />		solution_eval = aim(reply[0], reply[1])<br />		# report progress<br />		print(‘>%d f(%s) = %.5f’ % (it, reply, solution_eval))<br />	return [solution, solution_eval]</p><p># seed the pseudo random amount generator<br />seed(1)<br /># define fluctuate for enter<br />bounds = asarray([[-1.0, 1.0], [-1.0, 1.0]])<br /># define the entire iterations<br />n_iter = 30<br /># define the step dimension<br />step_size = 0.1<br /># define momentum<br />momentum = 0.3<br /># perform the gradient descent search with nesterov momentum<br />biggest, ranking = nesterov(aim, spinoff, bounds, n_iter, step_size, momentum)<br />print(‘Done!’)<br />print(‘f(%s) = %f’ % (biggest, ranking))

# gradient descent optimization with nesterov momentum for a two-dimensional check out carry out

from math import sqrt

from numpy import asarray

from numpy.random import rand

from numpy.random import seed

# aim carry out

def aim(x, y):

return x**2.0 + y**2.0

# spinoff of aim carry out

def spinoff(x, y):

return asarray([x * 2.0, y * 2.0])

# gradient descent algorithm with nesterov momentum

def nesterov(aim, spinoff, bounds, n_iter, step_size, momentum):

# generate an preliminary stage

reply = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] – bounds[:, 0])

# itemizing of modifications made to each variable

change = [0.0 for _ in range(bounds.shape[0])]

# run the gradient descent

for it in fluctuate(n_iter):

# calculate the projected reply

projected = [solution[i] + momentum * change[i] for i in fluctuate(reply.type[0])]

# calculate the gradient for the projection

gradient = spinoff(projected[0], projected[1])

# assemble a solution one variable at a time

new_solution = itemizing()

for i in fluctuate(reply.type[0]):

# calculate the change

change[i] = (momentum * change[i]) – step_size * gradient[i]

# calculate the model new place on this variable

value = reply[i] + change[i]

# retailer this variable

new_solution.append(value)

# take into account candidate stage

reply = asarray(new_solution)

solution_eval = aim(reply[0], reply[1])

# report progress

print(‘>%d f(%s) = %.5f’ % (it, reply, solution_eval))

return [solution, solution_eval]

# seed the pseudo random amount generator

seed(1)

# define fluctuate for enter

bounds = asarray([[–1.0, 1.0], [–1.0, 1.0]])

# define the entire iterations

n_iter = 30

# define the step dimension

step_size = 0.1

# define momentum

momentum = 0.3

# perform the gradient descent search with nesterov momentum

biggest, ranking = nesterov(aim, spinoff, bounds, n_iter, step_size, momentum)

print(‘Done!’)

print(‘f(%s) = %f’ % (biggest, ranking))

Running the occasion applies the optimization algorithm with Nesterov Momentum to our check out draw back and experiences effectivity of the search for each iteration of the algorithm.

Note: Your outcomes would possibly vary given the stochastic nature of the algorithm or evaluation course of, or variations in numerical precision. Consider working the occasion a variety of cases and consider the everyday finish consequence.

In this case, we’ll see {{that a}} near optimum reply was found after possibly 15 iterations of the search, with enter values near 0.0 and 0.0, evaluating to 0.0.

>0 f([-0.13276479 0.35251919]) = 0.14190<br />>1 f([-0.09824595 0.2608642 ]) = 0.07770<br />>2 f([-0.07031223 0.18669416]) = 0.03980<br />>3 f([-0.0495457 0.13155452]) = 0.01976<br />>4 f([-0.03465259 0.0920101 ]) = 0.00967<br />>5 f([-0.02414772 0.06411742]) = 0.00469<br />>6 f([-0.01679701 0.04459969]) = 0.00227<br />>7 f([-0.01167344 0.0309955 ]) = 0.00110<br />>8 f([-0.00810909 0.02153139]) = 0.00053<br />>9 f([-0.00563183 0.01495373]) = 0.00026<br />>10 f([-0.00391092 0.01038434]) = 0.00012<br />>11 f([-0.00271572 0.00721082]) = 0.00006<br />>12 f([-0.00188573 0.00500701]) = 0.00003<br />>13 f([-0.00130938 0.0034767 ]) = 0.00001<br />>14 f([-0.00090918 0.00241408]) = 0.00001<br />>15 f([-0.0006313 0.00167624]) = 0.00000<br />>16 f([-0.00043835 0.00116391]) = 0.00000<br />>17 f([-0.00030437 0.00080817]) = 0.00000<br />>18 f([-0.00021134 0.00056116]) = 0.00000<br />>19 f([-0.00014675 0.00038964]) = 0.00000<br />>20 f([-0.00010189 0.00027055]) = 0.00000<br />>21 f([-7.07505806e-05 1.87858067e-04]) = 0.00000<br />>22 f([-4.91260884e-05 1.30440372e-04]) = 0.00000<br />>23 f([-3.41109926e-05 9.05720503e-05]) = 0.00000<br />>24 f([-2.36851711e-05 6.28892431e-05]) = 0.00000<br />>25 f([-1.64459397e-05 4.36675208e-05]) = 0.00000<br />>26 f([-1.14193362e-05 3.03208033e-05]) = 0.00000<br />>27 f([-7.92908415e-06 2.10534304e-05]) = 0.00000<br />>28 f([-5.50560682e-06 1.46185748e-05]) = 0.00000<br />>29 f([-3.82285090e-06 1.01504945e-05]) = 0.00000<br />Done!<br />f([-3.82285090e-06 1.01504945e-05]) = 0.000000

>0 f([-0.13276479 0.35251919]) = 0.14190

>1 f([-0.09824595 0.2608642 ]) = 0.07770

>2 f([-0.07031223 0.18669416]) = 0.03980

>3 f([-0.0495457 0.13155452]) = 0.01976

>4 f([-0.03465259 0.0920101 ]) = 0.00967

>5 f([-0.02414772 0.06411742]) = 0.00469

>6 f([-0.01679701 0.04459969]) = 0.00227

>7 f([-0.01167344 0.0309955 ]) = 0.00110

>8 f([-0.00810909 0.02153139]) = 0.00053

>9 f([-0.00563183 0.01495373]) = 0.00026

>10 f([-0.00391092 0.01038434]) = 0.00012

>11 f([-0.00271572 0.00721082]) = 0.00006

>12 f([-0.00188573 0.00500701]) = 0.00003

>13 f([-0.00130938 0.0034767 ]) = 0.00001

>14 f([-0.00090918 0.00241408]) = 0.00001

>15 f([-0.0006313 0.00167624]) = 0.00000

>16 f([-0.00043835 0.00116391]) = 0.00000

>17 f([-0.00030437 0.00080817]) = 0.00000

>18 f([-0.00021134 0.00056116]) = 0.00000

>19 f([-0.00014675 0.00038964]) = 0.00000

>20 f([-0.00010189 0.00027055]) = 0.00000

>21 f([-7.07505806e-05 1.87858067e-04]) = 0.00000

>22 f([-4.91260884e-05 1.30440372e-04]) = 0.00000

>23 f([-3.41109926e-05 9.05720503e-05]) = 0.00000

>24 f([-2.36851711e-05 6.28892431e-05]) = 0.00000

>25 f([-1.64459397e-05 4.36675208e-05]) = 0.00000

>26 f([-1.14193362e-05 3.03208033e-05]) = 0.00000

>27 f([-7.92908415e-06 2.10534304e-05]) = 0.00000

>28 f([-5.50560682e-06 1.46185748e-05]) = 0.00000

>29 f([-3.82285090e-06 1.01504945e-05]) = 0.00000

Done!

f([-3.82285090e-06 1.01504945e-05]) = 0.000000

Visualization of Nesterov Momentum

We can plot the progress of the Nesterov Momentum search on a contour plot of the world.

This can current an intuition for the progress of the search over the iterations of the algorithm.

We ought to exchange the nesterov() carry out to care for a list of all choices found all through the search, then return this itemizing on the end of the search.

The updated mannequin of the carry out with these modifications is listed beneath.

# gradient descent algorithm with nesterov momentum<br />def nesterov(aim, spinoff, bounds, n_iter, step_size, momentum):<br />	# observe all choices<br />	choices = itemizing()<br />	# generate an preliminary stage<br />	reply = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] – bounds[:, 0])<br />	# itemizing of modifications made to each variable<br />	change = [0.0 for _ in range(bounds.shape[0])]<br />	# run the gradient descent<br />	for it in fluctuate(n_iter):<br />		# calculate the projected reply<br />		projected = [solution[i] + momentum * change[i] for i in fluctuate(reply.type[0])]<br />		# calculate the gradient for the projection<br />		gradient = spinoff(projected[0], projected[1])<br />		# assemble a solution one variable at a time<br />		new_solution = itemizing()<br />		for i in fluctuate(reply.type[0]):<br />			# calculate the change<br />			change[i] = (momentum * change[i]) – step_size * gradient[i]<br />			# calculate the model new place on this variable<br />			value = reply[i] + change[i]<br />			# retailer this variable<br />			new_solution.append(value)<br />		# retailer the model new reply<br />		reply = asarray(new_solution)<br />		choices.append(reply)<br />		# take into account candidate stage<br />		solution_eval = aim(reply[0], reply[1])<br />		# report progress<br />		print(‘>%d f(%s) = %.5f’ % (it, reply, solution_eval))<br />	return choices

# gradient descent algorithm with nesterov momentum

def nesterov(aim, spinoff, bounds, n_iter, step_size, momentum):

# observe all choices

choices = itemizing()

# generate an preliminary stage

reply = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] – bounds[:, 0])

# itemizing of modifications made to each variable

change = [0.0 for _ in range(bounds.shape[0])]

# run the gradient descent

for it in fluctuate(n_iter):

# calculate the projected reply

projected = [solution[i] + momentum * change[i] for i in fluctuate(reply.type[0])]

# calculate the gradient for the projection

gradient = spinoff(projected[0], projected[1])

# assemble a solution one variable at a time

new_solution = itemizing()

for i in fluctuate(reply.type[0]):

# calculate the change

change[i] = (momentum * change[i]) – step_size * gradient[i]

# calculate the model new place on this variable

value = reply[i] + change[i]

# retailer this variable

new_solution.append(value)

# retailer the model new reply

reply = asarray(new_solution)

choices.append(reply)

# take into account candidate stage

solution_eval = aim(reply[0], reply[1])

# report progress

print(‘>%d f(%s) = %.5f’ % (it, reply, solution_eval))

return choices

We can then execute the search as sooner than, and this time retrieve the itemizing of choices instead of probably the greatest remaining reply.

…<br /># seed the pseudo random amount generator<br />seed(1)<br /># define fluctuate for enter<br />bounds = asarray([[-1.0, 1.0], [-1.0, 1.0]])<br /># define the entire iterations<br />n_iter = 50<br /># define the step dimension<br />step_size = 0.01<br /># define momentum<br />momentum = 0.8<br /># perform the gradient descent search with nesterov momentum<br />choices = nesterov(aim, spinoff, bounds, n_iter, step_size, momentum)

...

# seed the pseudo random amount generator

seed(1)

# define fluctuate for enter

bounds = asarray([[–1.0, 1.0], [–1.0, 1.0]])

# define the entire iterations

n_iter = 50

# define the step dimension

step_size = 0.01

# define momentum

momentum = 0.8

# perform the gradient descent search with nesterov momentum

choices = nesterov(aim, spinoff, bounds, n_iter, step_size, momentum)

We can then create a contour plot of the goal carry out, as sooner than.

…<br /># sample enter fluctuate uniformly at 0.1 increments<br />xaxis = arange(bounds[0,0], bounds[0,1], 0.1)<br />yaxis = arange(bounds[1,0], bounds[1,1], 0.1)<br /># create a mesh from the axis<br />x, y = meshgrid(xaxis, yaxis)<br /># compute targets<br />outcomes = aim(x, y)<br /># create a stuffed contour plot with 50 ranges and jet coloration scheme<br />pyplot.contourf(x, y, outcomes, ranges=50, cmap=’jet’)

...

# sample enter fluctuate uniformly at 0.1 increments

xaxis = arange(bounds[0,0], bounds[0,1], 0.1)

yaxis = arange(bounds[1,0], bounds[1,1], 0.1)

# create a mesh from the axis

x, y = meshgrid(xaxis, yaxis)

# compute targets

outcomes = aim(x, y)

# create a stuffed contour plot with 50 ranges and jet coloration scheme

pyplot.contourf(x, y, outcomes, ranges=50, cmap=‘jet’)

Finally, we’ll plot each reply found all through the search as a white dot linked by a line.

…<br /># plot the sample as black circles<br />choices = asarray(choices)<br />pyplot.plot(choices[:, 0], choices[:, 1], ‘.-‘, coloration=”w”)

...

# plot the sample as black circles

choices = asarray(choices)

pyplot.plot(choices[:, 0], choices[:, 1], ‘.-‘, coloration=‘w’)

Tying this all collectively, the entire occasion of performing the Nesterov Momentum optimization on the check out draw back and plotting the outcomes on a contour plot is listed beneath.

# occasion of plotting the nesterov momentum search on a contour plot of the check out carry out<br />from math import sqrt<br />from numpy import asarray<br />from numpy import arange<br />from numpy.random import rand<br />from numpy.random import seed<br />from numpy import meshgrid<br />from matplotlib import pyplot<br />from mpl_toolkits.mplot3d import Axes3D</p><p># aim carry out<br />def aim(x, y):<br />	return x**2.0 + y**2.0</p><p># spinoff of aim carry out<br />def spinoff(x, y):<br />	return asarray([x * 2.0, y * 2.0])</p><p># gradient descent algorithm with nesterov momentum<br />def nesterov(aim, spinoff, bounds, n_iter, step_size, momentum):<br />	# observe all choices<br />	choices = itemizing()<br />	# generate an preliminary stage<br />	reply = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] – bounds[:, 0])<br />	# itemizing of modifications made to each variable<br />	change = [0.0 for _ in range(bounds.shape[0])]<br />	# run the gradient descent<br />	for it in fluctuate(n_iter):<br />		# calculate the projected reply<br />		projected = [solution[i] + momentum * change[i] for i in fluctuate(reply.type[0])]<br />		# calculate the gradient for the projection<br />		gradient = spinoff(projected[0], projected[1])<br />		# assemble a solution one variable at a time<br />		new_solution = itemizing()<br />		for i in fluctuate(reply.type[0]):<br />			# calculate the change<br />			change[i] = (momentum * change[i]) – step_size * gradient[i]<br />			# calculate the model new place on this variable<br />			value = reply[i] + change[i]<br />			# retailer this variable<br />			new_solution.append(value)<br />		# retailer the model new reply<br />		reply = asarray(new_solution)<br />		choices.append(reply)<br />		# take into account candidate stage<br />		solution_eval = aim(reply[0], reply[1])<br />		# report progress<br />		print(‘>%d f(%s) = %.5f’ % (it, reply, solution_eval))<br />	return choices</p><p># seed the pseudo random amount generator<br />seed(1)<br /># define fluctuate for enter<br />bounds = asarray([[-1.0, 1.0], [-1.0, 1.0]])<br /># define the entire iterations<br />n_iter = 50<br /># define the step dimension<br />step_size = 0.01<br /># define momentum<br />momentum = 0.8<br /># perform the gradient descent search with nesterov momentum<br />choices = nesterov(aim, spinoff, bounds, n_iter, step_size, momentum)<br /># sample enter fluctuate uniformly at 0.1 increments<br />xaxis = arange(bounds[0,0], bounds[0,1], 0.1)<br />yaxis = arange(bounds[1,0], bounds[1,1], 0.1)<br /># create a mesh from the axis<br />x, y = meshgrid(xaxis, yaxis)<br /># compute targets<br />outcomes = aim(x, y)<br /># create a stuffed contour plot with 50 ranges and jet coloration scheme<br />pyplot.contourf(x, y, outcomes, ranges=50, cmap=’jet’)<br /># plot the sample as black circles<br />choices = asarray(choices)<br />pyplot.plot(choices[:, 0], choices[:, 1], ‘.-‘, coloration=”w”)<br /># current the plot<br />pyplot.current()

# occasion of plotting the nesterov momentum search on a contour plot of the check out carry out

from math import sqrt

from numpy import asarray

from numpy import arange

from numpy.random import rand

from numpy.random import seed

from numpy import meshgrid

from matplotlib import pyplot

from mpl_toolkits.mplot3d import Axes3D

# aim carry out

def aim(x, y):

return x**2.0 + y**2.0

# spinoff of aim carry out

def spinoff(x, y):

return asarray([x * 2.0, y * 2.0])

# gradient descent algorithm with nesterov momentum

def nesterov(aim, spinoff, bounds, n_iter, step_size, momentum):

# observe all choices

choices = itemizing()

# generate an preliminary stage

reply = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] – bounds[:, 0])

# itemizing of modifications made to each variable

change = [0.0 for _ in range(bounds.shape[0])]

# run the gradient descent

for it in fluctuate(n_iter):

# calculate the projected reply

projected = [solution[i] + momentum * change[i] for i in fluctuate(reply.type[0])]

# calculate the gradient for the projection

gradient = spinoff(projected[0], projected[1])

# assemble a solution one variable at a time

new_solution = itemizing()

for i in fluctuate(reply.type[0]):

# calculate the change

change[i] = (momentum * change[i]) – step_size * gradient[i]

# calculate the model new place on this variable

value = reply[i] + change[i]

# retailer this variable

new_solution.append(value)

# retailer the model new reply

reply = asarray(new_solution)

choices.append(reply)

# take into account candidate stage

solution_eval = aim(reply[0], reply[1])

# report progress

print(‘>%d f(%s) = %.5f’ % (it, reply, solution_eval))

return choices

# seed the pseudo random amount generator

seed(1)

# define fluctuate for enter

bounds = asarray([[–1.0, 1.0], [–1.0, 1.0]])

# define the entire iterations

n_iter = 50

# define the step dimension

step_size = 0.01

# define momentum

momentum = 0.8

# perform the gradient descent search with nesterov momentum

choices = nesterov(aim, spinoff, bounds, n_iter, step_size, momentum)

# sample enter fluctuate uniformly at 0.1 increments

xaxis = arange(bounds[0,0], bounds[0,1], 0.1)

yaxis = arange(bounds[1,0], bounds[1,1], 0.1)

# create a mesh from the axis

x, y = meshgrid(xaxis, yaxis)

# compute targets

outcomes = aim(x, y)

# create a stuffed contour plot with 50 ranges and jet coloration scheme

pyplot.contourf(x, y, outcomes, ranges=50, cmap=‘jet’)

# plot the sample as black circles

choices = asarray(choices)

pyplot.plot(choices[:, 0], choices[:, 1], ‘.-‘, coloration=‘w’)

# current the plot

pyplot.current()

Running the occasion performs the search as sooner than, apart from on this case, the contour plot of the goal carry out is created.

In this case, we’ll see {{that a}} white dot is confirmed for each reply found all through the search, starting above the optima and progressively getting nearer to the optima on the center of the plot.

Contour Plot of the Test Objective Function With Nesterov Momentum Search Results Shown

Summary

In this tutorial, you discovered the way in which to develop the gradient descent optimization with Nesterov Momentum from scratch.

Specifically, you realized:

Gradient descent is an optimization algorithm that makes use of the gradient of the goal carry out to navigate the search home.
The convergence of gradient descent optimization algorithm may be accelerated by extending the algorithm and together with Nesterov Momentum.
How to implement the Nesterov Momentum optimization algorithm from scratch and apply it to an aim carry out and take into account the outcomes.

Do you have any questions?
Ask your questions throughout the suggestions beneath and I’ll do my biggest to answer.

Search This Blog

Solution Desk

Why Does My Snapchat AI Have a Story? Has Snapchat AI Been Hacked?

Gradient Descent With Nesterov Momentum From Scratch

Tutorial Overview

Gradient Descent

Want to Get Started With Optimization Algorithms?

Nesterov Momentum

Gradient Descent With Nesterov Momentum

Two-Dimensional Test Problem

Gradient Descent Optimization With Nesterov Momentum

Visualization of Nesterov Momentum

Further Reading

Papers

Books

APIs

Articles

Summary

Get a Handle on Modern Optimization Algorithms!

Develop Your Understanding of Optimization

Bring Modern Optimization Algorithms to
Your Machine Learning Projects

More On This Topic

Comments

Post a Comment

Popular posts from this blog

7 Things to Consider Before Buying Auto Insurance

Why Does My Snapchat AI Have a Story? Has Snapchat AI Been Hacked?

TransformX by Scale AI is Oct 19-21: Register with out spending a dime!

Why Does My Snapchat AI Have a Story? Has Snapchat AI Been Hacked?

Gradient Descent With Nesterov Momentum From Scratch

Tutorial Overview

Gradient Descent

Want to Get Started With Optimization Algorithms?

Nesterov Momentum

Gradient Descent With Nesterov Momentum

Two-Dimensional Test Problem

Gradient Descent Optimization With Nesterov Momentum

Visualization of Nesterov Momentum

Further Reading

Papers

Books

APIs

Articles

Summary

Get a Handle on Modern Optimization Algorithms!

Develop Your Understanding of Optimization

Bring Modern Optimization Algorithms to Your Machine Learning Projects

More On This Topic

Comments

Post a Comment

Popular posts from this blog

7 Things to Consider Before Buying Auto Insurance

Why Does My Snapchat AI Have a Story? Has Snapchat AI Been Hacked?

TransformX by Scale AI is Oct 19-21: Register with out spending a dime!

Bring Modern Optimization Algorithms to
Your Machine Learning Projects