NitroFusion: High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training
Paper
β’
2412.02030
β’
Published
β’
19
NitroFusion: High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training
Dar-Yen Chen, Hmrishav Bandyopadhyay, Kai Zou, Yi-Zhe Song
nitrosd-realism_comfyui.safetensors and nitrosd-vibrant_comfyui.safetensors, as well as a workflow are now released.NitroFusion single-step Text-to-Image demo hosted on π€ Hugging Face Space
nitrosd-realism_unet.safetensors: Produces photorealistic images with fine details. nitrosd-vibrant_unet.safetensors: Offers vibrant, saturated color characteristics.First, we need to implement the scheduler with timestep shift for multi-step inference:
from diffusers import LCMScheduler
class TimestepShiftLCMScheduler(LCMScheduler):
def __init__(self, *args, shifted_timestep=250, **kwargs):
super().__init__(*args, **kwargs)
self.register_to_config(shifted_timestep=shifted_timestep)
def set_timesteps(self, *args, **kwargs):
super().set_timesteps(*args, **kwargs)
self.origin_timesteps = self.timesteps.clone()
self.shifted_timesteps = (self.timesteps * self.config.shifted_timestep / self.config.num_train_timesteps).long()
self.timesteps = self.shifted_timesteps
def step(self, model_output, timestep, sample, generator=None, return_dict=True):
if self.step_index is None:
self._init_step_index(timestep)
self.timesteps = self.origin_timesteps
output = super().step(model_output, timestep, sample, generator, return_dict)
self.timesteps = self.shifted_timesteps
return output
We can then utilize the diffuser pipeline:
import torch
from diffusers import DiffusionPipeline, UNet2DConditionModel
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
# Load model.
base_model_id = "stabilityai/stable-diffusion-xl-base-1.0"
repo = "ChenDY/NitroFusion"
# NitroSD-Realism
ckpt = "nitrosd-realism_unet.safetensors"
unet = UNet2DConditionModel.from_config(base_model_id, subfolder="unet").to("cuda", torch.float16)
unet.load_state_dict(load_file(hf_hub_download(repo, ckpt), device="cuda"))
scheduler = TimestepShiftLCMScheduler.from_pretrained(base_model_id, subfolder="scheduler", shifted_timestep=250)
scheduler.config.original_inference_steps = 4
# # NitroSD-Vibrant
# ckpt = "nitrosd-vibrant_unet.safetensors"
# unet = UNet2DConditionModel.from_config(base_model_id, subfolder="unet").to("cuda", torch.float16)
# unet.load_state_dict(load_file(hf_hub_download(repo, ckpt), device="cuda"))
# scheduler = TimestepShiftLCMScheduler.from_pretrained(base_model_id, subfolder="scheduler", shifted_timestep=500)
# scheduler.config.original_inference_steps = 4
pipe = DiffusionPipeline.from_pretrained(
base_model_id,
unet=unet,
scheduler=scheduler,
torch_dtype=torch.float16,
variant="fp16",
).to("cuda")
prompt = "a photo of a cat"
image = pipe(
prompt=prompt,
num_inference_steps=1, # NotroSD-Realism and -Vibrant both support 1 - 4 inference steps.
guidance_scale=0,
).images[0]
nitrosd-realism_comfyui.safetensors and nitrosd-vibrant_comfyui.safetensors, and place them in the ComfyUI/models/checkpoints.ComfyUI/custom_nodes.NitroSD-Realism is released under cc-by-nc-4.0, following its base model DMD2.
NitroSD-Vibrant is released under openrail++.