Can MoE be used to get this to 4B and still fit into 5090 VRAM?

#11
by usernameSRSalreadyexists - opened

(I am not an expert so this question may seem a bit silly.)

Is it possible to use the split (high noise and low noise) approach that Wan 2.2 uses with your architecture so that it could be a 4B model instead of only 2B?

Since the 5090 is the only relatively affordable option and is still limited to 32 GB, it seems that anything that can take full advantage of that (without requiring swapping to system RAM for the model) is the best way to go. Text encoding seems to be able to be done with a CPU without that causing much slowdown.

If the text encoder is swapped to system RAM and run from the CPU, could this be even larger than 4B if MoE is used?

Motif Technologies org

Hi @usernameSRSalreadyexists ,

Good question! The current 2B model doesn't support MoE, but we are planning to explore MoE for our next model. Stay tuned!

Sign up or log in to comment