Samantha Wells
- Oct 17, 2021
- 4 min read

Revolutionary and realistic AI Voice Engines with AMAI technology

What is AMAI?

AMAI is a startup that produces ultra-realistic AI Voice Engines. AMAI developed an AI voice that could not be discerned from a real human speech by 97% of users. Its technology can be incorporated in a number of different markets, such as call centers, banking, digital assistance, podcasts, and audiobooks.

Current industry landscape

The synthetic voice, recognition, and voice acting market is expected to grow from $ 8.3 billion in 2021 to $ 22 billion in 2026. The text-to-speech market is projected to reach USD 7.06 Billion by 2028. Companies need to produce more audio tracks with human input as voice digital assistants become more popular. In millions of customer interactions, brands want to maintain a consistent sound, but they don't want to use shared voices. As a result, AI-assisted voice cloning is a solution that helps automate processes by replacing people as well as improve the sound quality of their voices. Even so, it is still difficult to maintain a realistic voice over the long period of time that an audiobook or podcast may take. As well, the companies that operate online stores encounter problems with the sharing of up-to-date information between call-center employees: many customers complain about receiving irrelevant or false information (for banks, this problem is even more pressing). A human factor issue is crucial: in general, one of the arguments for AI is that the possibility of human error disappears. Artificial intelligence, however, poses an ethical dilemma. AI will never replace human actors. In fact, the development of AI voice models and recording of these AI voices by studios benefit not only the studios but also the actors, since these voices provide an additional source of royalties.

AMAI's competitors include Nuance, Acapela, Cereproc, Well Said Labs, Murf and Sonantic

How AMAI was started

Together with his business partner, Pavel Osokin had a brainstorm on what any company would need, without exception. In their minds, there was no question that it’s sales. They saw the future in technologies that would replace people, but at the same time increase the companies’ efficiency. Thus, they began to create a robot that could write, respond to emails, messengers, etc. At some point, however, they realized that the robot should also call, not just write. After a short search for quality voices, they hired their first employee to prototype their own voice. As a result, they created voice technologies that were well-liked by the first users and concentrated their efforts on that.

How AMAI works

AMAI offers alternative voiceover products to those offered by Google, Amazon, and IBM, focusing on enhancing the quality of synthetic voice and developing solutions that fundamentally address privacy concerns. AMAI's products are specifically designed for enterprise businesses to help them optimize operations and reduce their reliance on human resources. AMAI’s voice synthesis based on deep learning runs on central-process-unit (CPU) and graphic-process-unit (GPU), with a response time of less than 200ms. The product’s core is comprised of an ensemble of deep learning models adapted to different languages. AMAI has three principal models: the first receive the text input, fills in the emphasis marks (for the languages that have them) and normalizes the text; from there, the text moves to the multi-speaker model that creates a mel spectrogram with variable configurations of energy, rate and sound tone; and the last model converts the mel spectrogram into audio. The models function in streaming mode, enabling the performance of synthesis on CPU (response time < 500 ms) and GPU (response time < 200 ms) in real-time, regardless of the volume of text submitted for synthesis.

According to an internal study, 97% of people are unable to distinguish artificial speech developed by AMAI from living persons using a Turing test. AI voiceover of audiobooks, podcasts and other texts, enabling reduction of voice artist costs by 30%. And with AMAI's voice editor, creating an audio recording with emotions, stops, and pauses is now as simple as sharing a photo on Instagram and does not require any special skills.

Conversational AI with a voice editor allows you to select the mood, accent, and speech speed. Over 5,000 applications have already been submitted for sale before the public launch. It's the first editor that doesn't require coding skills.

How AMAI is changing lives

AMAI created the first online voice editor that does not require coding knowledge. Its AI book voiceover technology reduces the cost of book voice acting by a third. AI voiceover of audiobooks, podcasts and other texts, enabling reduction of voice artist costs by 30%. It can replace partly humans and give them a superpower.

A message from the founder

'Soon, we will be on almost every computer and phone. Our synthesized voice will be able to speak in all languages and translate unknown words automatically. Eventually, you will be able to watch videos in Farsi, talk on vacation in Bahasa, listen to Morgan Freeman in pure French, and even talk to your microwave about problems at work and ask your refrigerator to order bread. We already voice-over audiobooks, podcasts, articles and help call-centers. Investors have very little time left to catch the departurting train, because we are certain that voice is the next Internet revolution', notes Pavel Osokin, founder & CEO at AMAI.

Check them out: https://amai.io/

Follow them on social:

LinkedIn: https://www.linkedin.com/company/amaiio/

Twitter: https://twitter.com/Amai_io