To introduce more people to Mastodon, I made some introduction videos. That is after discovering that on DLSite during a summer sale. Besides the usual games, manga, and other Japanese-related media, you can buy voice banks and voice synthesizer software. It got me the idea.
Of course, I played around with text-to-speech solutions that are in there, besides the built-in options on MacOS, which is how I was introduced to it during my childhood. Now, voice synthesis has evolved since then. While the new voices Apple added in Mac OS X Lion were an improvement, it still doesn’t sound natural. The Japanese TTS voice in Mac OS X cannot do pitch accents, which is essential if one wants to speak in Japanese.
I have played with cloud TTS services like Microsoft Azure Speech and IBM Watson TTS. While they can be helpful for flashcards, it has a few problems with words reading is said incorrectly, and of course, it’s pay per use. You must keep paying to generate TTS, which can get expensive when you start making longer content. Also, it sounds a bit too normal.
Enter AI Voice and Cevio AI. If you heard the latter, Cevio AI is like Vocaloid, as you can use voice banks to create songs. The difference is, Cevio allows you to also allow to create speech audio as well. However, the voice packs for talking are separate from the singing voices. While you can do the same with Vocaloid, making it sound natural is not easy.
Using it, well, is a different story. I learned quite a bit while using Cevio AI. Of course, you can’t just copy and paste paragraphs of text. I notice that by doing this, sentences bleed into the next without the proper pause. Also, correcting the pronunciation of certain words not in the built-in dictionary takes time. Otherwise, it will sound awkward.
I got the hang of it after spending a few hours using it. Currently, Tsurumaki Maki is the only available English for Cevio AI, but more English-capable voices are coming. AI Voice also has an English voicebank, Kotonoha Akane and Aoi. Still, they don’t have the emotional parameters that Cevio AI has.
I plan to create content like short reviews, videos and such with this voice synthesis software. Either way, seeing further advancements in this space should be interesting.
Leave a Reply