The world of video production is on the cusp of an exciting innovation: Researchers from Adobe Research and the University of Michigan have developed an AI system called MultiFoley that could revolutionize the creation of synchronized sound effects. MultiFoley automatically and precisely generates so-called Foley sounds – individually tailored sound effects that are added to films and videos in post-production.
What makes MultiFoley special is the versatility of its input options. Users can create sounds via text input, reference audio, or video examples. In demonstrations, the system transformed a cat's meow into a lion's roar, or imitated piano notes with typewriter sounds – always synchronized to the video material.
Another advantage is the high audio quality with a bandwidth of 48 kHz. The researchers achieved this by training the AI with a combination of internet videos and professional sound effect libraries. MultiFoley is the first system to combine multiple input methods – text, audio, and video references – in a single model.
The precise synchronization between video and generated audio is ensured by a special mechanism. Visual features are analyzed at a rate of 8 frames per second and then upscaled to the audio sampling rate of 40 Hz. The result is an average synchronization accuracy of 0.8 seconds – a significant improvement over previous systems, which typically exhibited a delay of more than one second.
In tests against existing systems, MultiFoley performed superiorly in both audio-video synchronization and the matching of generated sounds to text descriptions. A user study found that 85.8 percent of participants rated the semantic consistency of MultiFoley better than the next best system, while 94.5 percent preferred the synchronization.
Despite the promising potential, the researchers point out some limitations. The system's training data was relatively small, which restricts the range of sound effects. Also, the generation of multiple simultaneous sounds still presents a challenge.
The team plans to release the source code and models soon. Although Adobe has not yet announced any plans to integrate MultiFoley into its products, the technology would be a good fit with the existing AI features in the video editing software Premiere Pro. Both individuals and production companies could benefit from the system by significantly simplifying the sound design process.
The development of MultiFoley highlights the transformative potential of AI in the creative industry. By automating complex tasks such as sound effect creation, creatives can focus on artistic design and produce innovative content more efficiently.
Bibliographie: https://www.adobe.com/creativecloud/video/discover/foley-sound-effects.html https://www.adobe.com/creativecloud/video/discover/sfx-for-video.html https://www.chatpaper.com/chatpaper/paper/85419 https://arxiv.org/html/2411.17698 https://arxiv.org/html/2112.09726v4 https://www.adobe.com/creativecloud/video/discover/how-to-use-sound-to-enhance-social-media-posts.html https://www.adobe.com/creativecloud/video/discover/a-guide-to-movie-sound-effects.html https://www.adobe.com/creativecloud/video/discover/how-to-make-a-music-video.html https://creativecloud.adobe.com/de/discover/video/Enhancing-Audio-Video-Skills-Challenge/23915 https://www.blackmagicdesign.com/products/davinciresolve/fairlight