Diffusion models have revolutionized the world of image synthesis and editing. They enable the creation and modification of images with impressive quality and flexibility. A new approach called Stable Flow leverages the strengths of Diffusion Transformers (DiTs) and flow-matching to enable consistent image editing without prior training.
In contrast to traditional UNet architectures, which are used in many diffusion models, newer models rely on DiTs. These offer advantages in terms of training and sampling but often show limited diversity in generation. Stable Flow uses this property to achieve drawn image changes through selective injection of attention features.
One challenge in using DiTs lies in their lack of a coarse-to-fine structure, which makes it difficult to determine the optimal layers for injecting the features. Stable Flow addresses this problem through automatic identification of the so-called "vital layers" within the DiT. These layers are crucial for image formation and enable a variety of controlled edits, from non-rigid modifications to adding objects.
To enable the editing of real images, Stable Flow introduces an improved method for image inversion for flow models. This allows real images to be transferred into the model's latent space and edited there. The results can then be transformed back into realistic images.
Stable Flow opens up new possibilities in various areas of image editing. The method was evaluated using qualitative and quantitative comparisons as well as a user study. The results demonstrate the effectiveness of the approach in various applications.
Mindverse, a German company, offers an all-in-one platform for AI-powered content creation, including text, images, and research. As an AI partner, Mindverse develops customized solutions such as chatbots, voicebots, AI search engines, and knowledge systems. Mindverse's expertise in AI technologies enables companies to develop innovative solutions and optimally utilize the possibilities of artificial intelligence.
Stable Flow represents an important step towards more intuitive and efficient image editing. The ability to perform complex changes without training simplifies the workflow and opens up new creative possibilities. The further development of such technologies promises to fundamentally change image editing in the future.
Bibliographie: Chung, Y. J., et al. "Style Injection in Diffusion: A Training-free Approach for Adapting Large-scale Generative Models." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2024. Huang, Y., et al. "Diffusion Model-Based Image Editing: A Survey." arXiv preprint arXiv:2402.17525. 2024. Patashnik, O., et al. "Localizing Object-level Shape Variations with Text-to-Image Diffusion Models." arXiv preprint arXiv:2303.11306. 2023. Voynov, A., et al. "P+: Extended Textual Conditioning in Text-to-Image Generation." arXiv preprint arXiv:2405.05945. 2024. Yamaguchi, S., et al. "Controllable Image Editing with Sparse Representations." arXiv preprint arXiv:2412.08123. 2024. Lumina, Transforming Text into Stunning Visuals. netinfo.click/books/prog/Lumina-T2X%20Transforming%20Text%20into.pdf Song, Y., et al. "Fast Personalized Text-to-Image Syntheses With Attention Injection." arXiv preprint arXiv:2403.08864. 2024. Alzubaidi, L., et al. "Review of deep learning: concepts, CNN architectures, challenges, applications, future directions." Journal of Big Data 8.1 (2021): 1-74. Avrahami, O., et al. "Stable Flow: Vital Layers for Training-Free Image Editing." arXiv preprint arXiv:2411.14430. 2024. https://www.linkedin.com/posts/ssw_techtrends-ai-promptengineering-activity-7215144022661599233-sTTm