Releasing FP8 & F16 Models

by TatsuyaXAI - opened Aug 18, 2025

Aug 18, 2025

First of all, thank you for the open-source models. Qwen is bringing huge growth to the open-source development of LLMs and now image generation.

I hope in the future there will also be FP8 and F18 models launched for lower-end GPUs with only 8–16 GB of VRAM.

It would be great to have multiple models, such as one focused on realism and another on animation, similar to the fine-tuned models of SDXL and SD 1.5.

Since these large models are mostly practical for enterprises but very difficult for personal or retail users, smaller optimized versions would be a big help.

Again, thank you for the superb model.

KevinZonda

Aug 19, 2025

•

edited Aug 19, 2025

It can be converted directly through Diffusers, right?

https://huggingface.co/Qwen/Qwen-Image-Edit/discussions/6#68a39afdf4aa9e784e43afc0

C0nsumption

Aug 19, 2025

It can be converted directly through Diffusers, right?

https://huggingface.co/Qwen/Qwen-Image-Edit/discussions/6#68a39afdf4aa9e784e43afc0

In the process of finding out right now.
Will let you know.
The downloads are killing me, softly.

NielsGx

Aug 19, 2025

Why would you use FP16 instead of BF16 though ?
If you GPU doesn't support BF16, I don't think you could even run this

Wait for a FP8 scaled model from Kijai (smart scaling is way better than a naive truncated FP8)

C0nsumption

Aug 19, 2025

Both the bitsandbytes code and torchao code are now functional.
They can be found here:

bitsandbytes: ~17GB VRAM
https://huggingface.co/Qwen/Qwen-Image-Edit/discussions/6#68a3f2b63a24e2df78974f5d

torchao: ~23GB VRAM
https://huggingface.co/Qwen/Qwen-Image-Edit/discussions/6#68a4013ec45c7fbadef91472

BasketOfPups

Aug 19, 2025

NielsGx: There's a "fast fp16_accumulation" that makes FP16 faster on some (nvidia, as far as I know) cards. Shows up as "fp16_fast" I believe, in some ComfyUI nodes. So that'd be >A< reason.

Found the reference, from the Kijai Wan 2.1 T2V workflow: "fp_16_fast enables 'Full FP16 Accumulation in FP16 GEMMs" feature available in the very latest pytorch nightly, this is around 20% speed boost. '

So that's >A< reason, if you've got vram to burn.

heitzek

Feb 24

Why would you use FP16 instead of BF16 though ?
If you GPU doesn't support BF16, I don't think you could even run this

Wait for a FP8 scaled model from Kijai (smart scaling is way better than a naive truncated FP8)

Well, you have a ton of Tesla V100 with 16 and 32gb of HBM for cheap that won't support BF16, and they support proper nvlink.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment