Archive for May, 2011

Text To Speech

I was thrilled a couple of years ago when I was approached by Cepstral — one of the premiere architects of high quality, natural sounding voice synthesis products — to be one of their text-to speech voices….and I was even thrilled by their very public “proposal”. They did a presentation at Astricon one year, and while discussing their range of voices available, a slide appeared on the screen which read: “Coming soon: The Allison Voice!”

Geez, give a girl some notice. At least we’re not capturing the event on a jumbotron.

A Text to Speech (TTS) synthesis is basically the artificial production of human speech — most people’s first thought will gravitate immediately to Stephen Hawking, whose Text to Speech voice has become a part of his persona; legend has it that Cepstral — who designed his initial TTS utility has offered him numerous “upgrades” and more current and evolved versions throughout the years for him to experiment with. He has turned them all down. His early, rudimentary “voice” works well; it is recognizable, and most signficantly, it has practically become a part of who he is. Text to Speech products immeasurably enhance the lives those unable to speak, and it’s imperative that the user and voice connect on a visceral level.

A Text to Speech system converts normal language text into speech, by concatenating pieces of recorded speech which are stored in a database. Phonemes and graphemes are simply broken-down sound “fragments” which the system recognizes, and assigns those sounds to what it recognizes the typed word to which it should  correspond. The storage of entire words and even sentences allows for high-quality output, but is laborious and time-intensive to record.

Tell me about it.

Cepstral’s goal, when they proposed the idea of working together, was to build a very robust TTS engine — possibly the most robust they’d ever designed. Due to the prevalence of my voice not only on the Asterisk Open Source PBX but with many other telephony platforms, they saw the advantages in recording volumes more “sounds” than usually required to build a typical TTS system, so as to create as seamless as possible an interface which would dovetail well with pre-installed stock prompts and custom-recorded prompts alike — all voiced by me. As a way of achieving that, a script arrived which had the breadth (and thickness) of a typical major-city white pages telephone book. No problem!

In this script, I found thousands upon thousands of single words — and just as many pages of random, and often non-sensical sentences (“During the period, the company continued to benefit from favorable tax effects”, or “But oh what a hit it could be”, as examples). From larger sentences, phonemes can be farmed (think of the single sounds and combinations of sounds which could be extracted from the sentence: “Julie put on her red coat and made it to the train station by nine”) and stored for retrieval when the system perceives that the “fragment” is needed (although it’s not flawless: at a subsequent Astricon after the Allison TTS Voice was launched,  one of the Digium staffers was very eager to unveil the Cepstral Allison Voice; he typed in “Hi! I’m Allison Smith!” and out of the computer I spoke: “Hi! I’m Allison Smeeeeth!”) I find it hard to believe we didn’t capture the “ih” sound that the “I” in “Smith” makes, but there you have it. (One of the most difficult sounds to capture in a TTS application is — oddly enough — the word: “of” — widely used in the English language; it’s one of the few words where “f” is pronounced “v”. naturally, this creates problems for TTS utilities.)

I devoted about three hours a day for several weeks to getting the project recorded, and managed to soldier through it — not only voicing all words and sentences, but editing them into individual sound files. Apparently it was worth it — the Cepstral Allison TTS voice is the number one selling voice for Cepstral, and is offered as a very useful add-on for purchasers of the Asterisk PBX.  The uses of TTS for the speaking-disabled allow for clear, real-time communication for those with challenges; other applications in the area of transcription  of the written word to audio format are immeasurably vast and key to its growth and evolution. While it will never “replace” me (I’ve had a few clients who have tried doing longer paragraphs and one client who even tried to “forge together” an entire on-hold system using strictly my TTS voice — unsuccessfully), the text-to-speech utility is ideal for filling in gaps, smithing together proper and place names, or simply bridging together prompts which need integration. While the Allison TTS voice — just by the volume of material which built it — is a formidable and extensive TTS utility, it will always be identifiable as “mechanized” and never apt to be mistaken from an organic recording.

Check out the Cepstral Allison Voice at: www.cepstral.com/demos

…type anything in, and I’ll say it. Yes, anything. My husband if prone to typing in things like: “You are correct 100% of the time!” or “There are no chores for you today!”; hearing them in a slightly robotic, manufactured style is better than not hearing them at all….

Thanks for reading! Next blog, I’ll dig deeper into the voices which tell you when to turn — the occasionally vexing world of GPS voices!

Allison Smith is a professional telephone voice, who can be heard voicing systems for telephone systems and private companies throughout the world, including platforms for Verizon, Qwest, Cingular, Sprint, Bell Canada, Hawai’ian Telcom, and Asterisk.  Her website is www.theivrvoice.com.

Advertisements

The Voices Behind the Consoles

There are certain TV shows and films which feature unseen, automated, mechanical “voices” which never materialize into human form; they usually occupy a mechanized “framework” instead of a body, and are characters who become all the more fascinating and alluring to us *because* of their mystique of being “unseen”.

One of the most legendary “Voices Behind the Consoles” is HAL 9000, the computer — and major antagonist — in Arthur C. Clarke’s saga and immortalized in the 1968 film2001: A Space Odyssey and its 1984 sequel, 2010. HAL (Heuristically ALgorithmic Computer) was an artificial intelligence which interacted with the crew, usually only represented by a red television camera “eye”. Speaking in a soft, conversational style, HAL was portrayed with understated slyness and with a surprising level of depth by Canadian actor Douglas Rain.

HAL was capable of not only speech, but speech recognition, facial recognition, natural language processing and lip-reading (discovered when Bowman and Poole — crew members who doubted HAL’s reliability and discussed replacing him in what they thought was a private conversation in the one of the EVA pods) and playing chess – skills which have come to be almost “expected” in automated forms. What placed HAL way ahead of his time were skills far outside the realm of what is within an automaton’s typical reach — (apparent) art appreciation, reasoning — and even more staggering: interpreting and reproducing human behavior; aspects which are still considered too arcane and subjective to be accomplished with consistency by a computer.

With some memorable quotes like: “It can only be attributable to human error” (regarding the supposed failure of the parabolic antenna on the ship — which HAL *himself* falsified) , some classic HAL sound clips best illustrate the wry, unflappable style of HAL, and what made him one of AFI’s greatest film villains of all time: (click the links below):

error

feelit

decision

Capable of malice and diabolical revenge (severing Poole’s oxygen and setting him adrift and suspending life functions for those crew members in suspended animation, to name just a few), HAL was an ever-present, ominous….*force*, made all the more compelling by being seen as only an unflinching, staring red “eye”.

Considerably less ominous was the voice of the computer interface in Star Trek — voiced by Majel Barrett-Roddenberry, wife of Star Trek creator Gene Roddenberry. Largely uncredited as the voice of the computer, Majel Roddenberry also played the role of nurse Christine Chapel in the original series. Roddenberry’s (no pun intended) stellar delivery of the prompts — with just the right flavor of detachment, efficiency, and all-business demeanor, made for a solid and unwaveringly steady computer voice; reassuring mixed with just the right amount of clinical properness — and completely devoid of the trickery and malice of HAL: (click the links below):

allnesc

autoshut

cali

Always present; continuously watching; and never registering any emotional investiture into the outcome of situations the crew may have found themselves in, Majel Roddenberry’s computer voice provided wonderful continuity throughout the episodes and well into the movie franchises — during the last of which — the 11th movie in the series — she completed voicing her computer lines mere weeks before she passed away from leukemia in 2008.

Whether they become the conscience of the spacecraft, or merely keep everyone on kilter with gentle adjustments or admonishments, the characters of the automated computer genre appeal so widely because they never fully make an entrance. Without being over-the-top droll — as in Kit, the Night Rider car on-board voice, or outright smarmy, as in the unseen “Charlie” intercom voice in Charlie’s Angels, HAL and Majel Roddenberry’s computer voice provided intriguing characters who were omnipresent, all-knowing, and infinitely more engaging for their invisibility.

Next blog — in about two week’s time — I”ll discuss the infinite possibilities and amazing advancements made in the area of text-to-speech.

Thanks for reading! Feel free to leave a comment!

Allison Smith is a professional telephone voice, who can be heard voicing systems for telephone systems and private companies throughout the world, including platforms for Verizon, Qwest, Cingular, Sprint, Bell Canada, Hawai’ian Telcom, and Asterisk.  Her website is www.theivrvoice.com.