Audio

You have probably used speech-to-text or text-to-speech on your smartphone or smart speaker before. This technology has been around for a while, but over the past few years, it has become much more accurate and versatile. With generative AI, new possibilities are emerging: from creating music and sound effects to generating voices that can express emotions.

Let’s explore some prompts together:

  1. Genres and styles: Choose the genre and style you want, such as ‘a classical symphony’ or ‘a rap song’.
  2. Instruments and tone: Decide which instruments you want to hear and the mood you are going for. Do you prefer a warm acoustic guitar or an electronic synthesizer?
  3. Text and emotion: Give clear instructions for the text and emotion you want to convey in a voice recording, whether it is an enthusiastic commercial or a calm narration.

Text-to-speech: new possibilities

New and improved voice models now allow for generating voice recordings that vary in intonation and emotion, depending on the content. AI can do more than just read aloud; it can whisper or even act out entire scenes. With just a few seconds of a voice recording, you can clone a voice. You can even create unique voices simply by describing them. In text-to-speech models, you usually do not work with prompts. Instead, you select a voice and input the text you want it to speak.

Listen to the clips below and notice what makes each voice unique.

Even with AI voices, certain groups in our society are significantly underrepresented. For instance, there are very few female voices that speak with a Flemish accent.

Other generative AI applications

Music

Generative AI has completely transformed the way we make music. New models can now create not only instrumental pieces but also full songs with vocals. In short, you can craft an impressive track within minutes, though you might miss the joy of playing an instrument and experimenting with melodies.

Give it a try yourself. Find a free trial of an AI music generator and create a song that reminds you of a happy moment or helps you remember that tricky concept. You can use the prompt below or write your own.

Generate a pop song with an upbeat melody and a vocal chorus. The theme of the song is friendship.

Need some inspiration? Listen to the tracks below. The first song was automatically generated from a simple prompt, while the second one uses an existing text.

Speech-to-speech

With speech-to-speech technology, you can convert conversations into another language or voice in real time. It retains the speaker’s intonation, emotion, and nuance, making the conversation sound more natural and human-like. This technology is perfect for live translations, smart voice assistants that communicate effortlessly with users, and much more.

Check out this OpenAI demo where a user speaks to GPT-4o. He asks the LLM to count, first faster, then at a normal pace, and then slower. In this example, the model operates entirely using speech-to-speech technology.