Generate Sounds For Images Using AI – Soundify

Turn static images into immersive scenes with AI-generated sounds. Soundify analyzes visuals and produces complete soundtracks with just one click.

Soundify is a free AI tool that generates realistic soundscapes for visual media.

With just an image, it can automatically generate a realistic soundtrack that matches the scene. Imagine a film scene where not only the visible elements make noise but also the subtle sounds like a distant thunder or the rustling of leaves off-camera.

How it works:

Soundify first performs multi-level scene understanding using AI models like CLIP, Recognize Anything, and BLIP. It extracts visible objects, environment cues like text and sounds, location, time, weather and an overall descriptive caption. This scene context is formatted into a detailed prompt and fed to ChatGPT.

When prompted “What do I hear?”, ChatGPT suggests diverse sounds based on its reasoning capabilities. The user selects desired sounds which are passed to AudioGen to generate audio. Volume levels are predicted by analyzing if the sound subject is visible and its relative size in the visuals. Quieter background sounds are blended with louder foreground sounds.

Paper