Soundify is a free AI tool that generates realistic soundscapes for visual media.
With just an image, it can automatically generate a realistic soundtrack that matches the scene. Imagine a film scene where not only the visible elements make noise but also the subtle sounds like a distant thunder or the rustling of leaves off-camera.
How it works:
Soundify first performs multi-level scene understanding using AI models like CLIP, Recognize Anything, and BLIP. It extracts visible objects, environment cues like text and sounds, location, time, weather and an overall descriptive caption. This scene context is formatted into a detailed prompt and fed to ChatGPT.
When prompted “What do I hear?”, ChatGPT suggests diverse sounds based on its reasoning capabilities. The user selects desired sounds which are passed to AudioGen to generate audio. Volume levels are predicted by analyzing if the sound subject is visible and its relative size in the visuals. Quieter background sounds are blended with louder foreground sounds.
How to use it:
1. Go to the Soundify website.
2. Choose and upload an image representing the scene you want to sonify.
3. The AI will analyze the scene and predict sounds that fit. Click on a sound to generate it.