I’ve been reading more into training (mostly for wan2.1) lately and noticed this optimizer as an option in ai-toolkit as well as in diffusion-pipe.

Aside from just trying to read through and understand the source code, does anyone know of any documentation on how this is supposed to work or recommended usage/parameters? I can’t seem to find anything to learn more about it in my cursory searching.