OpenAI Unveils Multimodal Interaction in ChatGPT

Artificial intelligence took a huge leap forward today as ChatGPT unveiled groundbreaking new voice and image features. Users can now have natural conversations with the AI assistant using speech, as well as show it photos to analyze.

ChatGPT announced it is beginning to roll out these capabilities to Plus and Enterprise users over the next two weeks. The new features will eventually expand to all platforms, including mobile apps.

“We are beginning to roll out new voice and image capabilities in ChatGPT. They offer a new, more intuitive type of interface by allowing you to have a voice conversation or show ChatGPT what you’re talking about,” the company said.

The voice feature allows users to speak to ChatGPT and have it respond conversationally with one of five human-like voices. The voices were created in collaboration with professional voice actors and leverage advanced text-to-speech technology. Users can opt into the voice conversations on mobile by going to Settings and enabling the feature.

The image capability allows users to show ChatGPT photos of objects, graphs, or situations to get an analysis. Drawing tools are available on mobile to focus the AI on specific image areas. ChatGPT can look at fridge contents to suggest recipes, analyze graphs, and more.

ChatGPT said these features are powered by new multimodal AI models GPT-3.5 and GPT-4, which can reason about language in connection to images. The models were trained on a diverse range of photos to enable nuanced image understanding.

While thrilled about the possibilities, ChatGPT emphasized its commitment to responsible AI development. The company is deploying the features gradually to refine safety practices and prepare users for more advanced AI. Certain image analysis abilities are limited to focus on usefulness rather than making statements about individuals.

Why This Matters

The advancements in ChatGPT are pivotal, marking a transition from text-based to multimodal interactions. This transformation is not just a technical upgrade but a step forward in integrating AI more seamlessly into our lives, making it a more useful and accessible tool for a diverse array of tasks. The mindful approach to safety and ethical considerations ensures that the deployment of these new features is responsible and user-centric.