April 22, 2025

DRAGON: A Novel Approach to Optimizing Generative AI Models

Listen to this article as Podcast
0:00 / 0:00
DRAGON: A Novel Approach to Optimizing Generative AI Models
```html

DRAGON: A New Approach to Optimizing Generative AI Models

The development of generative AI models, particularly in the field of media creation, is progressing rapidly. A central aspect of this development is the optimization of these models to achieve high-quality results that meet the desired requirements. However, traditional methods like Reinforcement Learning with Human Feedback (RLHF) or Direct Preference Optimization (DPO) reach their limits here. A promising new approach called DRAGON (Distributional RewArds for Generative OptimizatioN) offers a flexible alternative.

How DRAGON Works

DRAGON differs from conventional methods through its ability to evaluate both individual examples and their distributions. This allows for the optimization of a wide range of reward functions, from instance-wise evaluations to comparisons between distributions. A particular advantage of DRAGON lies in the possibility of creating novel reward functions by selecting an encoder and a set of reference examples that form an exemplary distribution. By using cross-modality encoders like CLAP, these reference examples can even come from a different modality, for example, text compared to audio.

In the optimization process, DRAGON collects generated examples online and on-policy and evaluates them to create a positive and a negative demonstration set. The contrast between these two sets is then used to maximize the reward function. This approach enables targeted adaptation of the model to the desired properties.

Evaluation and Results

The effectiveness of DRAGON was evaluated using a text-to-music diffusion model in the audio domain. 20 different reward functions were used, including a model for musical aesthetics, the CLAP score, the Vendi diversity, and the Fréchet Audio Distance (FAD). The results showed an average success rate of 81.45% across all 20 reward functions. Particularly noteworthy is that the use of exemplary distributions led to an improvement in the generated results comparable to model-based rewards. With a suitable exemplary set, DRAGON even achieved a quality gain rate of 60.95% confirmed by human evaluators, without being trained on human preference annotations.

Applications and Outlook

The flexibility of DRAGON opens up diverse application possibilities in the optimization of generative AI models. By being able to use various reward functions and leverage exemplary distributions, models can be specifically adapted to specific requirements. This is particularly relevant for areas such as the creation of music, images, and texts, where subjective perception plays an important role. DRAGON thus represents an important step towards more efficient and targeted development of generative AI models and could fundamentally change the way we interact with creative AI systems. The ability to optimize complex reward functions opens new possibilities for the development of AI systems that are capable of delivering high-quality results that meet human expectations. Future research could focus on expanding the scope of DRAGON and developing further innovative reward functions.

Bibliography: Bai, Y., Casebeer, J., Sojoudi, S., & Bryan, N. J. (2025). DRAGON: Distributional Rewards Optimize Diffusion Generative Models. arXiv preprint arXiv:2504.15217. Kim, Y. J., & Oh, S. J. (2024). Diffusion model alignment using direct preference optimization with a pretrained reward model. arXiv preprint arXiv:2409.06493. Lu, C., Zhou, L., Bao, F., Zhang, J., & Zhu, J. (2025). Preference-based reinforcement learning with human feedback for text-to-image diffusion model. arXiv preprint arXiv:2502.12198. Mou, W., Zhou, N., Niu, Y., & Zhou, M. (2023). Elucidating Optimal Reward-Diversity Tradeoffs in Text-to-Image Diffusion Models. arXiv preprint arXiv:2311.15449. Nguyen, T. L., Nguyen, Q. H., Le, T., Vu, T., & Phung, D. (2024). Diffusion Policies for Reinforcement Learning. arXiv preprint arXiv:2408.03270. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 10684-10695). Wolleb, A., Ertel, C., & Blattmann, A. (2024). Conditioning Text-to-Image Diffusion Models on Signed Distance Functions. In European Conference on Computer Vision (pp. 586-602). Springer, Cham. Wang, Z., Zhou, D., Li, X., & Liu, Q. (2024). Diffusion models for reinforcement learning. arXiv preprint arXiv:2403.02279. Lukyanko, A. (2024). Paper review: Diffusion model alignment using direct preference optimization with a pretrained reward model. Raileanu, R., & Fergus, R. (2024). Diffusion Policies as an Alternative to Actor-Critic Methods for Continuous Control. Advances in Neural Information Processing Systems, 37. Janner, M., Li, Y., & Levine, S. (2024). Planning with diffusion for flexible behavior synthesis. International Conference on Learning Representations. ```