OpenAI Improves AI for Accurate Voice Recognition and Generation

Alexander22.03.2025

0 128 1 minute read

The OpenAI company has announced updates to its two leading models — Whisper, which is responsible for transcription, and Voice Engine, which provides voice synthesis. Improved versions demonstrate even higher accuracy of speech recognition and more natural reproduction of voices, bringing artificial intelligence closer to the level of live communication, informs TechCrunch.

These improvements make voice technology more accessible and accurate, which greatly enhances the capabilities of automatic transcription, voice assistants, and voiceover systems for video or audiobooks.

Whisper is now faster and more efficient, much better at recognizing complex accents, handling background noise and even damaged audio recordings. This expands its application to create high-quality transcriptions of interviews, conferences and other conversational formats.

The Voice Engine model has also been improved — it now reproduces the human voice even more accurately based on a short sample. This opens up new perspectives for use in voice assistants, voicing texts and creating personalized voice content.

We will remind that on March 19, the OpenAI company presented a new version of artificial intelligence – o1-pro, which should provide “constantly better answers”, but will be the company’s most expensive model. The o1-pro model in the API is an improved version of o1 and uses more computing power for deeper query processing and solving the most complex tasks.

The cost of using o1-pro is $150 for 1 million input tokens and $600 for output tokens. It is twice as expensive as input in GPT-4.5 and 10 times more expensive compared to the standard o1 version.