Simple Drop-in LoRA Conditioning on Attention Layers Will Improve Your Diffusion Model

Choi, Joo Young; Park, Jaesung R.; Park, Inkyu; Cho, Jaewoong; No, Albert; Ryu, Ernest K.

Computer Science > Computer Vision and Pattern Recognition

arXiv:2405.03958 (cs)

[Submitted on 7 May 2024]

Title:Simple Drop-in LoRA Conditioning on Attention Layers Will Improve Your Diffusion Model

Authors:Joo Young Choi, Jaesung R. Park, Inkyu Park, Jaewoong Cho, Albert No, Ernest K. Ryu

View PDF HTML (experimental)

Abstract:Current state-of-the-art diffusion models employ U-Net architectures containing convolutional and (qkv) self-attention layers. The U-Net processes images while being conditioned on the time embedding input for each sampling step and the class or caption embedding input corresponding to the desired conditional generation. Such conditioning involves scale-and-shift operations to the convolutional layers but does not directly affect the attention layers. While these standard architectural choices are certainly effective, not conditioning the attention layers feels arbitrary and potentially suboptimal. In this work, we show that simply adding LoRA conditioning to the attention layers without changing or tuning the other parts of the U-Net architecture improves the image generation quality. For example, a drop-in addition of LoRA conditioning to EDM diffusion model yields FID scores of 1.91/1.75 for unconditional and class-conditional CIFAR-10 generation, improving upon the baseline of 1.97/1.79.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2405.03958 [cs.CV]
	(or arXiv:2405.03958v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2405.03958

Submission history

From: Joo Young Choi [view email]
[v1] Tue, 7 May 2024 02:45:28 UTC (17,192 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Simple Drop-in LoRA Conditioning on Attention Layers Will Improve Your Diffusion Model

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Simple Drop-in LoRA Conditioning on Attention Layers Will Improve Your Diffusion Model

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators