There are many reasons someone might use AI to read aloud text. Whether it be for convenience, to assist with comprehension, or for use in educational and professional environments, this tool can be incredibly helpful in a wide range of settings. With that being said, I thought it would be interesting to investigate play.ht. This is an AI powered text to voice generator tool that creates realistic audio. The program uses an online voice generator along with synthetic voices. It has a wide range of AI voices, speech styles, pronunciations, and features that users can choose from.
The above image depicts the homepage I was brought to after creating my free account on play.ht. This page contains many resources including access to current projects, audio tools, and voiceover samples. From here, I decided to go straight into creating my audio project. I had the option of choosing between the "standard and premium" or "ulta realistic" voices. I tried the realistic voices first where I simply inputted my desired text and chose a voice. The below image displays a few of many voices that I had the option of selecting. It varies by gender, accent, age, and style, among others. I was shocked by the wide range of options and learned that the program can create speech in 142 languages and accents.
After inputting my text and selecting the voice, I very easily generated audio which I have inserted below. Instead of simply attaching an audio file, I was able to import and connect it with a video. I think this is an especially important feature as there are many situations where audio needs to be connected with videos. Having this tool directly in the platform is very useful as users will not have to use a third-party application to link the files.
After reviewing the final audio and video, I noticed that the AI mispronounced "AI." It pronounced it correctly the first time but as "A" the second time. I was surprised by this as it said it correctly the first time, so I was confused why it later mispronounced the word. This made me think of a notice I received when initially creating the video which states, "Each sample is unique. You can ‘Re-Generate Previews’ to generate multiple samples and select the one you prefer." I believe this helps explain why the program mispronounced AI the second time. Clearly, each sample is unique, and every time it says a word it may not always sound the same.
After reviewing the audio and further investigating the tools available, I do have to say I am pleasantly surprised with the program. I have only really used text to voice in Word and Google Translate, so I was not aware of how complex this AI can get. I learned that play.ht is being used by well-known sources including Harvard University and Product Hunt. It is pretty neat that I can use this tool alongside popular entity's like those mentioned. Overall, I was happy to investiage and learn more about this AI. I am looking forward to seeing how it continues to grow and evolve over time.


Comments
Post a Comment