jdrosen

Does Anyone Love the Call Center Voice Queue?

Blog Post created by jdrosen on Oct 10, 2009

I don't.

 

Sitting on your phone on a call center queue is one of the current low points of telecom services. As a user, I am typically subjected to some irritating music that I wish I could turn off. You get precious little information on how much longer you have to wait, and you hope you've gotten to the right queue. Nothing is worse than waiting fifteen minutes just to find out that you actually needed to call a different number.

 

The experience once you get connected isn't much better. First, you have to run through the same old identification process. You need to give your name, your account number, confirm your address, and so on. Then, you need to take down the name of the contact center agent, just in case you get disconnected. Once you manage to explain your problem, you have to dig up a pen, write down your case number for future reference, and hope you copied it right. Then, you get transferred to a survey asking about your experience.

 

This experience is not only frustrating for end users, it's expensive for enterprises. Every minute that an agent is on the phone with you is money out of their pocket, and shaving even a few seconds off the experience can add up to millions in savings over a year. 

 

If we go back to my list of complaints about the experience, a common theme emerges - the usage of speech as an interface worsens the experience. A call center call is ultimately a very data intensive activity. The agent needs to collect information from you (your name, number, confirmation of address, problem description, survey on your experience), and you need to collect information from them (name of the agent, case number, callback number). Speech conversation is generally a bad way to convey data. Firstly, it is slow. The amount of information humans can process by reading is much faster than the speed by which humans can process information by speaking and hearing it. The second problem is that it is inaccurate; it is easy to make mistakes in both reading and understanding. This is particularly true when conveying proper nouns (such as your home address or name of the product you are calling about), and the problem gets worse when accents are in play. The third problem is that speech output isn't amenable to easy extraction of electronic data. When the call center agent tells me my case number, they are conveying it to me in speech form, but that is not how I really want it. I want it in electronic form, so that I can easily store it or retrieve it at a later time.

 

Of course, contact center interactions work that way because they were fundamentally defined by the user interface of the telephone and the telephone network behind it. It only allowed for interactions that were based on speech combined with keypad input, and so the entire paradigm of how we think about interacting with contact centers was built around those limitations. These limitations are no small part of why the web, email, and IM are becoming an increasing part of interactions with contact centers. However, those experiences remain highly segmented from the voice experience. Sure, I can often click-to-call from a company's website to reach the contact center, but once there, I am right back to the problems with speech-only interfaces. To truly transform the contact center experience, we need to integrate the data and voice interactions so that they are seamless. This is best understood by walking through an example.

 

Lets say I work for a small law firm, and I'm having problems with my laptop. I pick up my new Cisco IP phone, and dial their 800 number. After a few rings, the call is answered. I start to hear audio prompts, but at the same time, my phone screen automatically changes to reveal a textual version of the menu of options that I am currently hearing. I also have the option to press a button on my phone to turn off the audio prompts, which I do. I quickly power through the options on my screen, pressing on the buttons presented to me on the phone's touch screen. Once I hit the final button to get in a queue, my phone display changes again. It provides an up to date readout on my expected wait time, and how many people are ahead of me. I can press a button to listen to music if I want, and even change the music through a menu of choices. For now, I keep the music off. This allows me to work on my legal briefing while I wait.

 

Once I reach the top of my queue, the phone plays an audible tone to let me know its my turn to talk to an agent. My phone display changes once more, and now provides me the name of the agent I'm connected to. "Hi", says Andrea. Andrea already knows my name and account number, as this information was provided to her automatically when I called. However, she wants to confirm my street address. She says, "Is this address still correct for you, Mr. Rosenberg?". My phone display updates again, and shows me my current street address. Below it are two buttons, one that says "correct" and one that says, "wrong". I tap on the "correct" button, and my address disappears. "Thanks, I'm glad your information is right. What can I help you with?", she says. I tell her that my laptop isn't powering up. She asks what model I have. I'm not sure, but I know its a widescreen one that starts with a T. "No problem", she says. "Is it one of these?". Once again, my phone display updates. It now shows me pictures of three different laptops, each of which is a widescreen T-series laptop. I tap on the one that I recognize as my own. "Great, thats a T61", she says. We talk through my problem, and Andrea provides me visual information on my phone at various points in the conversation. At one point, we even add video so that I can show Andrea the cellular modem card I have plugged into the laptop. At the end, my problem isn't resolved and someone will need to get back to me. "Here is your case number and a callback number, they'll appear in your call log now.". My screen updates to show me the case number, Andrea's name and agent ID, and a callback number for this department. The information on screen fades away, and my phone automatically adds it to my call log. This way, I don't need to write it down, I just search through my call history, and the call details for this call will have the information right there for me. Finally, Andrea asks me to rate the experience. My phone display updates again, this time with a radio button which allows me to select from one to five. I tap five. She thanks me, and we hang up.

 

This is the kind of contact center experience I want. It has seamlessly intertwined voice, video and data interactions, utilizing a truly multi-modal interface that uses the right modality for the right type of information. I saved time, and the contact center shaved many minutes off the interaction, saving them millions.

 

Realizing this kind of multi-modal interface requires us to break past the boundaries of the existing telephone network, not just within a company, but between companies. Contact center interactions are almost always between businesses or between a consumer and a business. If we can break the landlock of the public switched telephone network, and move to a pure IP interconnection, we can use that interconnection to provide a blend of voice, video and data and deliver the kind of experience I outline above.

 

In this future, user's won't hate contact center voice queues - because there aren't any

Outcomes