Add EMA with nesterov - Muon. #476

enriquezaf · 2026-01-26T01:08:08Z

Current version is kinda exploding the gradients, non standard ema, looks to me is half way into using 2 different betas but never got done.

Current

step 1: buf * momentum
step 2: grad + buf * momentum
step 3: grad + buf * momentum * (grad + buf * momentum)

buf.mul_(momentum)       # buf * momentum
buf.add_(grad)           # grad + buf * momentum
grad.add_(buf*momentum)  # grad + buf * momentum * (grad + buf * momentum)

# if momentum = 0.9
# 1: buf * 0.9
# 2: grad + 0.9 * buf
# 3: grad + 0.9 * (grad + 0.9 * buf) -> grad + 0.9 * grad + 0.81 * buf
# -> grad = 1.9 * grad + 0.81 * buf

New

# if momentum = 0.9
# grad = buf * 0.9 + 0.1 * grad

This match the default from heavyball.

Add EMA with nesterov, matching heavyball

7f7679d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add EMA with nesterov - Muon. #476

Add EMA with nesterov - Muon. #476

Uh oh!

enriquezaf commented Jan 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add EMA with nesterov - Muon. #476

Are you sure you want to change the base?

Add EMA with nesterov - Muon. #476

Uh oh!

Conversation

enriquezaf commented Jan 26, 2026

Current

New

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant