more complex loss functions and interesting ideas are welcomed.
- Explore the Behavior of Optimizer
- stable the adam optimzier. Exist many numberical problem.
- add Adamw optimizer
- add nestrove optimizer
- generate gif
- document
In this repository, I will use the following complex functions to explore the behavior of deffierent optimizer.
| loss1 | loss2 | booth | rastrigin |
|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
| ackley | complex | himmelblau | mccormick |
|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
loss1
loss2
booth:
rastrigin:
ackley:
complex:
himmelblau:
mccormick:
optimizer config
- config for loss1, loss2, booth:
lr=0.01, epochs=200, init_x=5, init_y=5, beta in Momentum is 0.9, beta1 in Adam is 0.9, beta2 in Adam is 0.999. - config for rastrigin:
lr=0.01, epochs=300, init_x=5, init_y=5, beta in Momentum is 0.9, beta1 in Adam is 0.9, beta2 in Adam is 0.999. - config for ackley:
lr=0.01, epochs=400, init_x=7.5, init_y=7.5, beta in Momentum is 0.9, beta1 in Adam is 0.9, beta2 in Adam is 0.999. - config for complex:
lr=0.001, epochs=200, init_x=4.3, init_y=-1.5, beta in Momentum is 0.9, beta1 in Adam is 0.9, beta2 in Adam is 0.999. - config for himmelblau:
lr=0.001, epochs=200, init_x=7.5, init_y=7.5, beta in Momentum is 0.9, beta1 in Adam is 0.9, beta2 in Adam is 0.999. - config for mccormick:
lr=0.1, epochs=200, init_x=7.5, init_y=7.5for SGD.lr=0.01, epochs=200, init_x=7.5, init_y=7.5, beta in Momentum is 0.9for Momentum.
| SGD | Momentum | Adam | |
|---|---|---|---|
| loss1 | ![]() |
![]() |
![]() |
| loss2 | ![]() |
![]() |
![]() |
| booth | ![]() |
![]() |
![]() |
| rastrigin | ![]() |
![]() |
![]() |
| ackley | ![]() |
![]() |
Fail |
| complex | ![]() |
![]() |
![]() |
| himmelblau | ![]() |
![]() |
![]() |
| mccormick | ![]() |
![]() |
TBD |
As you can see, Adam optimizer does not work well for those functions. But does this means that adam is not suitable for simple functions? Let's tune it's beta parameters in the following content.
loss2 is a simple function. But Adam works badly. Let's tune the beta parameters of adam to see if Adam can work better. I guess it can.
1. Just tune the beta1 to see what happens
Let: lr=1e-2, beta2=0.999, epochs=200
| beta1=0.5 | beta1=0.6 | beta1=0.7 | beta1=0.8 | beta1=0.9 |
|---|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
beta1=0.8 works well. Let's peforming a fine-grained tuning.
| beta1=0.8 | beta1=0.81 | beta1=0.805 | beta1=0.85 |
|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
beta1=0.81 works well. It can be seen that the beta1 control the existing trend. If beta1 is large, it will along the old trend. If beta1 is small, it will do adjust according the current gradient.
2. Let tune the beta2 to see if there are any surprises.
Let: lr=1e-2, beta1=0.81, epochs=200
| beta2 = 0.8 | beta2=0.9 | beta2=0.99 | beta2=0.999 | beta2=0.9999 |
|---|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
My explanation is beta2 controls the step size of optimizer, determining how far it will move along the trend. seems that beta2=0.999 is a nice choice.
Util now, as we can see, adam moves so slowly. How about increase it's learning rate? Let beta1=0.81, beta2=0.999, epochs=200.
| lr=0.01 | lr=0.03 |
|---|---|
![]() |
![]() |
We can see that increase learning rate is not a better choice.
3. Increase epochs.
Let lr=0.01, beta1=0.81, beta2=0.999.
| epochs=200 | epochs=400 | epochs=800 | epochs=1000 |
|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
We make it. My guess is right. But this also proves that, no matter how we tune the parameters, Adam converges slower than SGD on simple problems. Perhaps there is a better way to make Adam work better in such straightforward cases, or perhaps not, but I don't know it for now.
The above experiment result show that Adam does not converge in himmelblau function. So the first I want to lower the learning rate.
1. lower learning rate
Let beta1=0.9, beta2=0.999, epochs=200.
| lr=1e-4 | lr=3e-4 | lr=1e-5 | lr=3e-5 |
|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
Learning rate has an important influence on this optimizationi process. 1e-4 is a better choice.
"The Rastrigin function has many local optima, making it a challenging optimization problem. Now, let's see if Adam can optimize this problem effectively."
1. Increase Epochs The above experimental result shows that Adam can reach the optimal point if we increase epochs. Let's give it a try. Let: 'lr=0.01, epochs=300, init_x=5, init_y=5, beta1 in Adam is 0.9, beta2 in Adam in 0.999.'
| epochs=300 | epochs=600 |
|---|---|
![]() |
![]() |
As we can see, increasing the epochs is not helpful. Another singal that experimental result shows is Adam may not have enough momentum to reach the optimal. Let's tune the beta1 parameter.
2. Increase beta1
Let: lr=0.01, epochs=300, init_x=5, init_y=5, beta2=0.999.
| beta1=0.91 | beta1=0.92 | beta1=0.93 | beta1=0.94 |
|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
| beta1=0.95 | beta1=0.96 | beta1=0.97 | beta1=0.98 |
|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
It seems that beta1=0.98 is a better choice. Let's give it a fine-grained try.
| beta1=0.98 | beta1=0.983 | beta1=0.985 | beta1=0.99 |
|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
The above table experiments proves that Adam may not have enough momentum to reach the optimal is correct. Another noteworthy detail is just a slight change of beta1 can make a significant difference.
1. Clone the repository
git clone git@github.com:AllenWrong/From-Scratch.git
cd From-Scratch/learning-rate2. Run the Following command to have a first try
python demo.py \
--opt SGD \
--loss_fn loss1 \
--lr 1e-2 \
--epochs 200 \
--r_min -10 \
--r_max 10 \
--init_x 5 \
--init_y 5The output images are saved in ./imgs directory.
3. Run the following command to plot a function contour
python show_contour.py \
--fn loss1 \
--rmin -10 \
--rmax 10@misc{Explore the Behavior of Optimizer,
author = {Zhongchao, Guan},
title = {Explore the Behavior of Optimizer},
year = {2023},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/AllenWrong/From-Scratch/learning-rate}},
}
- Some content is created with the assistance of ChatGPT.
If you are interested in my project or you want to know more about the from scratch series, follow me on github.
If you have some ideas youd like to bring to life, please email me.
- 📧Email me: gg884691896@gmail.com
- Follow me on LinkedIn.






























































