Papers
arxiv:2605.22668

SEGA: Spectral-Energy Guided Attention for Resolution Extrapolation in Diffusion Transformers

Published on May 21
· Submitted by
Javad Rajabi
on May 22
Authors:
,
,

Abstract

SEGA improves high-resolution text-to-image generation by adaptively scaling attention across RoPE components based on spatial-frequency structure during denoising steps.

AI-generated summary

Diffusion transformers (DiTs) have emerged as a dominant architecture for text-to-image generation, yet their performance drops when generating at resolutions beyond their training range. Existing training-free approaches mitigate this by modifying inference-time attention behavior, often through Rotary Position Embeddings (RoPE) extrapolation combined with attention scaling. However, these strategies apply a uniform and content-agnostic scaling across RoPE components with distinct frequency characteristics, inducing a trade-off between preserving global structure and recovering fine detail. We introduce SEGA, a training-free method that dynamically scales attention across RoPE components according to the latent's spatial-frequency structure at each denoising step. This adaptive scaling improves both structural coherence and fine-detail fidelity. Experiments show that SEGA consistently improves high-resolution synthesis across multiple target resolutions, outperforming state-of-the-art training-free baselines.

Community

Paper author Paper submitter

TLDR: SEGA is a training-free method that uses spectral guidance to modify attention behavior through RoPE components scaling, improving high-resolution generation in diffusion transformers.

teaser (1)_page-0001 (1)

Hi. Great work!
Can this method be applied to distilled models? And is it possible to adapt it for i2i?
Thank you!

Выглядит интересно. Когда потыкать можно будет?

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.22668
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.22668 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.22668 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.22668 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.