OpenAI, a startup backed by Microsoft, announced new features for its AI chatbot ChatGPT on Monday. ChatGPT can now speak and see, which means users can get answers from it in five different voices and ask it questions about images.

Voice conversations with ChatGPT

ChatGPT will be able to answer users' questions in five different voices, which can be selected according to user preferences.

This standout feature will have the ability to engage in voice conversations with ChatGPT. ChatGPT users can now have real-time, conversations with their AI assistant, opening up a world of possibilities. ChatGPT's voice capabilities are available to assist you whether you're on the go, searching for a bedtime story for your family, or looking to settle a dinner table debate.

Photo: Chatgpt

To start voice, go to Settings in your mobile app, click on “New Features,” and select “Voice Conversations.” Once you’ve done that, tap on the “Headphones” icon in the right-hand corner of your home screen to start voice conversations. You’ll have five different voices to choose from, all of which have been expertly crafted by voice actors to give you the human-like sound quality you’re looking for. Whisper is OpenAI’s open-source, speech-recognition system that automatically transcribes your spoken word into text, improving the overall quality of your conversations

Images Conversations

The new image-sharing feature on ChatGPT allows users to share one or multiple images, enabling troubleshooting, content exploration, and complex data analysis through Chatbot's replies.

ChatGPT can now see, hear, and speak. Rolling out over next two weeks, Plus users will be able to have voice conversations with ChatGPT (iOS & Android) and to include images in conversations (all platforms). https://t.co/uNZjgbR5Bm pic.twitter.com/paG0hMshXb— OpenAI (@OpenAI) September 25, 2023

To make use of this feature, tap the photo button to select an image. For iOS and Android, tap the plus (+) button first to add multiple images or use the drawing tool to guide your AI assistant. Multimodal models like GPT-3.5 and GPT-4 can understand and respond to visual content, such as photos, screenshots, and documents containing text and images.

OpenAI is aware of the risks associated with these cutting-edge technologies. In the case of voice, the emphasis is on voice chat and the technology is developed in partnership with voice actors to guarantee authenticity and safety.

It is also reported that Spotify is planning to use this technology for its own Voice Translation feature, which allows podcasters to translate their content into different languages using only their own voices.