Transform text into remarkably natural speech with Gemini's groundbreaking native audio generation technology

Google AI Studio just dropped a game-changing feature for anyone working with audio. It's new speech generation tool lets you create super-natural-sounding voice content, whether you need a single narrator or a full conversation with multiple speakers. Perfect for podcasts, voiceovers, audiobooks, or any creative project, it makes high-quality speech synthesis easier and more versatile than ever.

This tutorial guides you through the steps to convert text into a life-like voice that narrates the text. The possibilities for using this feature are endless. It depends on your idea and the project. We are here to show you how to access the speech generation tool, configure the audio mode, write a script and customise voices. 

By the end of this tutorial, you’ll be able to:

  • Access the speech generation tool
  • Select an audio mode
  • Write your script and customize voices
  • Generate the audio

Let’s dive in right away!

Step 1 - Access the speech generation tool

Go to Google AI Studio and sign in with your Google account or create a new one if you haven’t already done so. 

Once you're on the main dashboard, find the “Generate Media” section in the left-hand menu and click it. 

Choose “Gemini speech generation” from the list of options that shows up.

When you open the speech generation interface, you’ll see the script builder on the left and the settings panel on the right. By default, it uses Gemini 2.5 Flash Preview TTS; however, for even better quality, you can switch to Gemini 2.5 Pro TTS using the dropdown menu.

Step 2 - Select an audio mode

Before starting the project, ensure that you have configured the voice settings. In the right-side panel, select the audio mode that best suits your project requirements. There are two options:

Single-speaker audio: Perfect for things like narrations, audiobooks, or voice-overs. You just drop your full script into one clean text box and pick one voice to read it all.

Multi-speaker audio: Great for dialogues, interviews, or anything with back-and-forth conversation. You’ll get separate text boxes for each speaker, so it’s easy to create natural, realistic chats.

Click the preferred mode in the settings panel to activate it. 

To help distinguish the speakers easily, you can specify the names of the speakers in the settings panel. 

Step 3 - Write your script and customize voices

In single-speaker mode, just type your script into the main box. If you want to set the vibe or add a note, such as “Read aloud with a dramatic flair.” Then pick a voice from the dropdown and you’re good to go.

In multi-speaker mode, you’ll get separate blocks for each person in the conversation. Add style notes at the top, then type what each speaker says. You can tweak each speaker’s name and voice by clicking their settings. Need more back-and-forth? Just hit “Add dialog” to keep the conversation going.

Step 4 - Generate the audio

There’s a feature in the Gemini voice generation tool where you can specify whether your script is a movie scene script or a podcast script. The AI will tweak its settings based on your input. 

Once your script’s all set, just hit the blue “Run” button at the bottom. The AI will take it from there and create the audio based on everything you’ve set up.

Once it’s done processing, you can preview the audio right in the interface. If something feels off, tweak your script, switch up the voice, or adjust the style notes, then rerun it. When you’re satisfied with the sound, simply download the file and use it in your video, podcast, or any other project you’re working on.

You can download the generated conversation in a .WAV format. Gemini also allows you to set the playback speed before downloading the voice script. Click the three-dot icon in the voice player that Gemini generated. Select playback speed and adjust it, or you can download the .WAV file. 

That’s a wrap for this tutorial! Gemini’s speech generation feature gives you tons of creative options—whether you’re adding voice to videos, making an audiobook, or anything else. We’ve walked through the basics of how to use the tool to turn your script into high-quality audio. Now it’s your turn to put it to work however you like.

Got an idea for a new feature or tutorial? Help us make the academy even better.

More tutorials like this

Learn the basics of Copilot: what it is, how to get started and a few use cases.
📖
General
Copilot
👨‍🎓
Beginner
Learn to set up an AI automation that will analyze and label inbound leads for you.
📖
Sales
Make
ChatGPT
👨‍🎓
Advanced