Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

readme.md

Explore the Behavior of Optimizer

more complex loss functions and interesting ideas are welcomed.

Table of Contents

TODO

  • stable the adam optimzier. Exist many numberical problem.
  • add Adamw optimizer
  • add nestrove optimizer
  • generate gif
  • document

Some Complex function(x, y) visualization.

In this repository, I will use the following complex functions to explore the behavior of deffierent optimizer.

loss1 loss2 booth rastrigin
ackley complex himmelblau mccormick

Formular of the functions

loss1

$$f(x, y) = x^2 + y^2$$

loss2

$$f(x, y)=0.5x^2 + 10y^2 + x + 2y$$

booth:

$$f(x, y) = (x + 2y - 7)^2 + (2x + y - 5)^2$$

rastrigin

$$f(x, y) = 20 + x^2 - 10 \cos(2\pi x) + y^2 - 10 \cos(2\pi y)$$

ackley

$$f(x, y) = -20 \exp\left(-0.2 \sqrt{0.5(x^2 + y^2)}\right) - \exp\left(0.5(\cos(2\pi x) + \cos(2\pi y))\right) + 20 + e$$

complex:

$$f(x, y) = (x^3 - 3x^2 + 3y^2 - y^3)^2 + 0.1\cos(5x) + 0.1\sin(5y)$$

himmelblau

$$f(x, y) = (x^2 + y - 11)^2 + (x + y^2 - 7)^2$$

mccormick:

$$f(x, y) = \sin(x + y) + (x - y)^2 - 1.5x + 2.5y + 1$$

Experiment Result

The behavior of different optimizer on those functions.

optimizer config

  • config for loss1, loss2, booth: lr=0.01, epochs=200, init_x=5, init_y=5, beta in Momentum is 0.9, beta1 in Adam is 0.9, beta2 in Adam is 0.999.
  • config for rastrigin: lr=0.01, epochs=300, init_x=5, init_y=5, beta in Momentum is 0.9, beta1 in Adam is 0.9, beta2 in Adam is 0.999.
  • config for ackley: lr=0.01, epochs=400, init_x=7.5, init_y=7.5, beta in Momentum is 0.9, beta1 in Adam is 0.9, beta2 in Adam is 0.999.
  • config for complex: lr=0.001, epochs=200, init_x=4.3, init_y=-1.5, beta in Momentum is 0.9, beta1 in Adam is 0.9, beta2 in Adam is 0.999.
  • config for himmelblau: lr=0.001, epochs=200, init_x=7.5, init_y=7.5, beta in Momentum is 0.9, beta1 in Adam is 0.9, beta2 in Adam is 0.999.
  • config for mccormick: lr=0.1, epochs=200, init_x=7.5, init_y=7.5 for SGD. lr=0.01, epochs=200, init_x=7.5, init_y=7.5, beta in Momentum is 0.9 for Momentum.
SGD Momentum Adam
loss1
loss2
booth
rastrigin
ackley Fail
complex
himmelblau
mccormick TBD

As you can see, Adam optimizer does not work well for those functions. But does this means that adam is not suitable for simple functions? Let's tune it's beta parameters in the following content.

Different beta value of adam in loss2

loss2 is a simple function. But Adam works badly. Let's tune the beta parameters of adam to see if Adam can work better. I guess it can.

1. Just tune the beta1 to see what happens Let: lr=1e-2, beta2=0.999, epochs=200

beta1=0.5 beta1=0.6 beta1=0.7 beta1=0.8 beta1=0.9

beta1=0.8 works well. Let's peforming a fine-grained tuning.

beta1=0.8 beta1=0.81 beta1=0.805 beta1=0.85

beta1=0.81 works well. It can be seen that the beta1 control the existing trend. If beta1 is large, it will along the old trend. If beta1 is small, it will do adjust according the current gradient.

2. Let tune the beta2 to see if there are any surprises. Let: lr=1e-2, beta1=0.81, epochs=200

beta2 = 0.8 beta2=0.9 beta2=0.99 beta2=0.999 beta2=0.9999

My explanation is beta2 controls the step size of optimizer, determining how far it will move along the trend. seems that beta2=0.999 is a nice choice.

Util now, as we can see, adam moves so slowly. How about increase it's learning rate? Let beta1=0.81, beta2=0.999, epochs=200.

lr=0.01 lr=0.03

We can see that increase learning rate is not a better choice.

3. Increase epochs. Let lr=0.01, beta1=0.81, beta2=0.999.

epochs=200 epochs=400 epochs=800 epochs=1000

We make it. My guess is right. But this also proves that, no matter how we tune the parameters, Adam converges slower than SGD on simple problems. Perhaps there is a better way to make Adam work better in such straightforward cases, or perhaps not, but I don't know it for now.

Explore of adam in himmelblau function

The above experiment result show that Adam does not converge in himmelblau function. So the first I want to lower the learning rate.

1. lower learning rate Let beta1=0.9, beta2=0.999, epochs=200.

lr=1e-4 lr=3e-4 lr=1e-5 lr=3e-5

Learning rate has an important influence on this optimizationi process. 1e-4 is a better choice.

Explore Adam in rastrigin function

"The Rastrigin function has many local optima, making it a challenging optimization problem. Now, let's see if Adam can optimize this problem effectively."

1. Increase Epochs The above experimental result shows that Adam can reach the optimal point if we increase epochs. Let's give it a try. Let: 'lr=0.01, epochs=300, init_x=5, init_y=5, beta1 in Adam is 0.9, beta2 in Adam in 0.999.'

epochs=300 epochs=600

As we can see, increasing the epochs is not helpful. Another singal that experimental result shows is Adam may not have enough momentum to reach the optimal. Let's tune the beta1 parameter.

2. Increase beta1

Let: lr=0.01, epochs=300, init_x=5, init_y=5, beta2=0.999.

beta1=0.91 beta1=0.92 beta1=0.93 beta1=0.94
beta1=0.95 beta1=0.96 beta1=0.97 beta1=0.98

It seems that beta1=0.98 is a better choice. Let's give it a fine-grained try.

beta1=0.98 beta1=0.983 beta1=0.985 beta1=0.99

The above table experiments proves that Adam may not have enough momentum to reach the optimal is correct. Another noteworthy detail is just a slight change of beta1 can make a significant difference.

Different steps in rastrigin of SGD

Different init value of complex

Different lr of SGD in mccormick

Usage

1. Clone the repository

git clone git@github.com:AllenWrong/From-Scratch.git
cd From-Scratch/learning-rate

2. Run the Following command to have a first try

python demo.py \
  --opt SGD \
  --loss_fn loss1 \
  --lr 1e-2 \
  --epochs 200 \
  --r_min -10 \
  --r_max 10 \
  --init_x 5 \
  --init_y 5

The output images are saved in ./imgs directory.

3. Run the following command to plot a function contour

python show_contour.py \
  --fn loss1 \
  --rmin -10 \
  --rmax 10

Citation

@misc{Explore the Behavior of Optimizer,
  author = {Zhongchao, Guan},
  title = {Explore the Behavior of Optimizer},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/AllenWrong/From-Scratch/learning-rate}},
}

Acknowledgments

  • Some content is created with the assistance of ChatGPT.

Contact With Me

If you are interested in my project or you want to know more about the from scratch series, follow me on github.

If you have some ideas youd like to bring to life, please email me.

License

License: MIT