There was an active and vehement move away from the robotic, automaton sound of early voice recordings on telephones — recordings which seem to have been deliberately done that way, in order to eliminate any confusion as to whether or not the caller had reached an actual, live human being or a “machine” — and a move more towards a relaxed, natural, conversational cadence. A tone which says: “Yes, you’ve definitely reached a self-serve system — but think of me as just another fellow human being. I hate these things, too!” The thinking behind it is: if the caller feels the voice behind the system is welcoming without using up too much of their time; reassuring without being obsequious; and — all the better — if the voice can sound like the caller’s best friend or neighbor, the caller will “engage” the system, follow the instructions accurately, not hang up in frustration, and not have a whole new veneer of annoyance on top of the issue they’re calling in about, by the time they *do* make it to an actual rep. And even if they have managed to turnkey themselves into a solution (made a reservation, checked their Visa balance) and never had to actually speak to a live operator, their opinion of that company or the transaction can still be made or broken by that automated voice alone.
Solid thinking. And personally — having voiced telephone prompts for companies internationally, and for a wide variety of industries — I applauded, and still continue to applaud that trend. Rather than having to “put on” a voice which isn’t actually natural for me to speak in, I’m allowed — nay — encouraged to sound like an actual, real person (what luck! I happen to *be* one…) Real people hesitate slightly when they’re trying to think of just the right word; there’s a certain…pause…which seems normal in everyday conversational rhythms; and there’s almost a “stumbling” effect which many clients want me to do when I’m voicing — so that I’ll sound like a real person. Actual speech is full of slurs, imperfections, and natural flaws which we all try to avoid in everyday conversation — it’s those natural “artifacts” which are big in IVR right now. (On the extreme end of that scale was a company who produced on-hold messages, who encouraged me — if at all possible — to come up with a yawn or sneeze in the middle of script, just to reinforce that a “real” person took the time to voice it — I talked them out of that. That’s just a little too “real”.)
However, consider this: humans — being quintessentially social — are infinitely comfortable in taking on the mannerisms, rhythms, and traits of those other human with which they’re interacting. Watch a pair of humans introducing themselves to one another, and the intricate ballet which ensues. They will — without even thinking about it — mirror the other’s mannerisms, the “rate” or speed at which they converse, and the innate need to “match” their conversational partner. It’s the reason why accents are irresistible to *not* absorb as you speak with a native of Scotland, for example. It’s why dating coaches actively encourage their clients to make a point of deliberately matching their prospective mate’s every mannerism move for move — that sympatico that mirroring creates is not only beneficial to our harmony with others — it’s automatic and almost impossible *not* to engage in.
How that relates to IVR is simple: whether or not you’re aware of it, you mirror the “tone” set by an IVR you call into. In many ways, that voice dictates the formality or informality of the transaction. It tells you everything you need to know about the company and even gives you an idea of the level of service and attentiveness you can expect when your issue or problem is eventually dealt with. And the degree of “precision” apparent in the voice is likely how *you* will respond.
Think, for example of an IVR voice saying — in a no-frills, somewhat flat-toned delivery: “Please tell me — clearly and slowly — the city to where you’d like to travel. Please press pound when finished.” If you’re the caller, hoping to book a flight to Cleveland, you’ll probably take that instruction quite seriously and deliberately slow your roll as you enunciate — much more slowly and clearly than you’d normally be inclined to: “I’d like to travel to Cleveland, please.” (Even mirroring the two “pleases” which were in their command). Or — you might even just intone: “Cleveland.” In stark contrast would be the “modern” style of IVR: “Great. I can help you book your trip.” (Playfully) “Why don’t tell me where you wanna go..?”
Naturally, you’re going to reply (playfully) “I wanna go tuh Cleveland.”
Fun, yes? And while this style of recording is accessible, young, modern, and warm, it didn’t take long for data to surface which found fault in that casual, almost *too* relaxed call and reply: speech recognition software struggles to fit the biometrics of “informalspeak” and complains of a less than perfect hit-rate when callers match a “lazy” IVR’s cue. Also, where with “traditional” clipped, more severe IVR’s, the caller would be more likely to just say “Cleveland”, for example, than a chatty, off-the-cuff IVR might be inclined to make the callers respond in kind, or elaborate more than they would under the parameters of a “stiff” automated system. With less accuracy comes confusion, more time burned up, and a greater chance that the customer will either pull the plug on the call, or be so annoyed with the ongoing attempts to repeat their selection, they’ll be stoked with a refreshed supply of vitriol for the poor CSR to whom the call eventually gets transferred.
While I’m a fan of a more relaxed, conversational tone — both as a caller and as a voice of IVR systems — the dangers of “under-enunciating” are vast and very real. I like to strike a balance between the friendly and natural, and also being a clear enunciator (while I keep my diction as clear as possible when I’m working, anyone who has spoken to me over the phone after a long day in the booth can testify that I slur like Tom Brokaw). To be relaxed and conversational, and yet authoritative enough to make sure people “hit” the speech recognition utility is always my goal.
Perhaps — to maintain the integrity and accuracy of speech recognition utilities — a certain amount of formality is required in an IVR. It could be argued that there’s no getting away from a steady, even-toned delivery, if it means a clean, well-running match-up of vocal input whose ultimate goal is getting callers to the right department.
I’m very excited about my next upcoming blog, where I interview the legendary Emily Yellin — arguably the world’s expert in customer relation metrics. We had a great chat about what companies desperately need to know about designing effective telephone systems, and I bring you that interview in about two week’s time.
As always, thanks for reading. If you have any comments or insights about what you’ve read, feel free to leave a comment!
Allison Smith is a professional telephone voice, who can be heard voicing systems for telephone systems and private companies throughout the world, including platforms for Verizon, Qwest, Cingular, Sprint, Bell Canada, Hawai’ian Telcom, and Asterisk. Her website is www.theivrvoice.com.