October 11, 2024

Rectified Diffusion: Simplifying and Expanding the Scope of Rectified Flow

Listen to this article as Podcast
0:00 / 0:00
Rectified Diffusion: Simplifying and Expanding the Scope of Rectified Flow
```html ## Rectified Diffusion: Enhancing Diffusion Models for Faster Visual Generation Diffusion models have significantly advanced visual generation but are hampered by slow generation speeds due to the computationally intensive nature of solving generative ODEs. Rectified Flow, a widely recognized solution, improves generation speed by straightening the ODE path. Its key components include: 1) using the diffusion form of flow-matching, 2) employing "v-prediction," and 3) performing rectification (also known as reflow). This article argues that rectification's success primarily stems from utilizing a pre-trained diffusion model to obtain matched noise-sample pairs, followed by retraining with these matched pairs. Based on this, components 1) and 2) are superfluous. ## Importance of the "First-Order Approximate ODE Path" Moreover, we emphasize that straightness is not an essential training objective for rectification; rather, it is a special case of flow-matching models. The more critical training objective is to achieve a "First-Order Approximate ODE Path," which is inherently curved for models like DDPM and Sub-VP. Building on this insight, we propose "Rectified Diffusion," which generalizes the design space and application scope of rectification to encompass the broader category of diffusion models, instead of being limited to flow-matching models. ## Validation and Advantages of "Rectified Diffusion" We validate our method on Stable Diffusion v1-5 and Stable Diffusion XL. Our method not only simplifies the training procedure of prior works based on rectified flow (e.g., InstaFlow) but also achieves superior performance with even less training effort. ## Background: Diffusion Models and Challenges Diffusion models have become a cornerstone in the field of artificial intelligence, particularly in the realm of image and video generation. They are lauded for their ability to produce high-quality results. However, this high quality often comes at the cost of speed, as the underlying computations are highly complex. The process of generating images from noise using these models requires solving intricate mathematical equations known as "Generative ODEs" (Ordinary Differential Equations). These computations are resource-intensive, requiring substantial processing power, which translates to longer waiting times during generation. ## Rectified Flow: A Step Towards Efficiency To address the challenges of speed and efficiency, the concept of "Rectified Flow" was introduced. This method aims to accelerate the generation process by straightening the path of the ODE. Imagine this path as a winding road; rectification essentially paves this road, making the journey faster and more efficient. The three main components of this approach include using a diffusion-based flow-matching mechanism, implementing "v-prediction," and finally, the process of rectification itself. ## "Rectified Diffusion": Expanding the Boundaries Recent research suggests that the power of rectification is not solely limited to its components but also stems from the use of a pre-trained diffusion model. This pre-trained model helps in obtaining matched pairs of noise and images, which then undergo the rectification process. This finding has led to the proposition that two of the three components, namely the diffusion form of flow-matching and "v-prediction," may not be strictly necessary. ## Beyond Straightness: The Concept of "First-Order Approximate ODE Path" Furthermore, it turns out that forcing a perfectly straight path may not be the optimal goal for rectification. Instead, research now focuses on achieving a "First-Order Approximate ODE Path". This path, which is inherently curved, has proven to be more effective, especially with popular models like DDPM and Sub-VP. This concept has led to the development of "Rectified Diffusion", a new method that aims to broaden the scope of rectification. Instead of being confined to flow-matching models, "Rectified Diffusion" aims to encompass a wider range of diffusion models, pushing the boundaries of visual generation. ```