A Brief History of Conversing with Companies

William Meisel

April 21, 2025

The history of attempts to allow talking to computers over the telephone illustrates the sometimes tortuous route that advanced technologies take before they address user needs in a way that drives wide adoption. A look at some history can provides insights into how using human language to connect to computers will evolve in the future, as well as its role in the growing power of artificial intelligence. This post focuses on the history of connecting with companies using the telephone.

Auto-attendants

“Auto-attendant” systems were early versions of how companies used telephone speech recognition to route calls received over a general company line. For example, the Computer Telephony Conference & Exposition held in Los Angeles March 4-6, 1997—more than 28 years ago—highlighted voice-driven auto-attendant systems provided by a number of companies. The automated attendant systems answered calls to a company and allowed connection to a person or department by saying the person’s name, generally including the names of all employees in the company.

In comparison with touch-tone automated systems, the caller did not need to know an extension or spell the name on a keypad. Peter Foster, president of Voice Control Systems, said at the conference that auto-attendant systems were the “killer application” over the next two years. He said that many other large-vocabulary systems being shown, e.g., systems providing stock quotes, were intrinsically limited in number of installations possible since the number of stock brokers were limited, while auto-attendants could serve almost any company. A point he didn’t address was the high cost of using such speech-driven systems in 1997 when computer power was more than 16,000 times more expensive than today.

A large number of voice-driven auto-attendants for outside callers were launched in 1998. Companies offering the application included Applied Language Technologies, Carnegie International, Locus Speech, Lucent Technologies, Lyrix, Northern Telecom, Parlance, Philips Speech Processing, Phonetic Systems, Preferred Voice, Registry Magic, Vocalis, Voice Control Systems, and Voice Quest. [The number of unfamiliar names is typical of early speech recognition applications, where inspiration drove many attempts at commercial success despite the difficulty.]

Voice auto-attendant technology apparently worked when adopted, but the rate of adoption was disappointing, considering that the size of the potential market was almost every company. The technology was sufficiently costly in 1998 that many companies apparently didn’t see a significant monetary advantage in adopting the approach.

In 1999, the auto-attendant category continued to expand. Ronald Larkin, president and CEO of Voice Control Systems said at a conference in January 1999, “Auto-attendant is one area that should have explosive growth.” In March, Kathy Frostad, director, telecommunications solutions marketing for Nuance Communications, said, “Speech-enabled auto-attendants should be widely adopted in 1999, especially in large and distributed enterprises.” Preferred Voice offered a voice-activated auto-attendant called Emma TR (Telephone Receptionist). Registry Magic’s Virtual Operator offered a turnkey speech recognition auto-attendant that attached to a business’ telephone system and performed the tasks of a live operator. Phonetic Systems announced enhancements to its high-capacity PhoneticOperator voice directory and voice-activated auto-attendant; the company also began shipping a new release for smaller directories. IBM announced the ViaVoice Directory Dialer, a voice-activated directory service and auto-attendant. Locus Dialogue offered Liaison, its voice-activated auto-attendant. Nortel demonstrated its Voice-Activated Business Directory, a PC-based automated attendant. Sound Advantage announced the release of SANDi, an automated telephone receptionist using speech recognition licensed from AT&T. Bell Atlantic introduced its Connect@once voice-activated auto-attendant using speech recognition technology from Nuance. BBN Technologies developed a voice attendant product that was spun off into Parlance Corporation.

McKesson, one of the world’s largest healthcare information technology companies, announced a speech-enabled auto-attendant for stores that sold pharmaceutical and healthcare products. SpeechWorks announced SpeechSite, a voice attendant system that used call direction as a basic feature, but could be extended to broader interaction with callers. Mitel announced a voice-activated auto-attendant for Mitel’s high-end PBX. Lucent announced that it would be selling a voice directory and voice attendant product developed by Phonetic Systems. Philips Speech Processing was selling Pure ReQuest!, a voice-activated automated attendant. Syntellect released its Vista Speech-Enabled Call Distributor, a voice-activated auto-attendant using speech recognition licensed from Nuance.

Auto-attendants that routed outside calls by the name of employees, some handling thousands of names, was one of those technologies that became outdated in that form, despite all the activity and predictions of a large market. It’s hard to find such a system that handles outside calls today with access to all employees’ names, in part because employees don’t want to be reachable by anyone who simply knows their name (driven in part by an increase in “spam” calls). Today, someone calling an individual in a company most likely would require knowledge of that individual’s extension.

Interactive Voice Response

Instead, most people phoning a company would call that company’s customer service line today, reaching an Interactive Voice Response (IVR) system. More than three decades ago in September 1993, AT&T announced that its CONVERSANT Voice Information System Release 4.0 would include telephone speech recognition called FlexWord. Callers making collect or calling-card calls on the AT&T network had the option of getting help from an automated speech recognition system—which responded to as many as 2,000 words or phrases—and could be adapted for customer applications using specialized vocabularies and different international languages. This system allowed users to order merchandise, get airline schedules, and conduct other transactions by voice commands instead of by pushing phone buttons.

Early smaller-vocabulary IVR applications that provided an alternative to touch-tone systems in customer service surprisingly achieved little impact. The problem appeared to be the incremental cost of adding speech recognition when there was a workable touch-tone or human-operator solution, given the cost of such systems at the time. While examples of how speech recognition was used implied at times a great deal of flexibility in the interaction, the reality in 1998 was more limited. The tools could often allow a very natural response if it were anticipated, but then fail on a similar response that was valid, but not anticipated. The applications often required that all the allowed alternative commands/sentences be explicitly elaborated in a “grammar.”

In December 1999, AT&T announced a commercial test of an ambitious “natural-language” application of telephone speech recognition. It announced with Prudential Insurance a trial of an advanced speech recognition system that was developed by AT&T Labs. AT&T Labs’ “How May I Help You” started the interaction in a customer service call, as the AT&T tradename suggested, by asking simply, “How may I help you?” A caller could say phrases such as, “I want to change my beneficiary,” or “I had an accident. I want to speak to an agent right away.” The Prudential system could use dialog to confirm the objective, clarify an objective, or ask for any information needed. The test most likely was more of a test of the technology rather than a solution that could be deployed economically.

In a more current example (2018), hotel chain Hyatt dopted technology from IVR system provider Interactions to expand their automated customer service solution. Hyatt’s eight contact centers around the world handled more than seven million calls each year. Customers called in to book new reservations, inquire about charges, make cancellations, request directions, or get information about amenities and services. The company wanted to upgrade its automated contact center services to provide a better customer experience and reduce agent time spent on tasks that could be automated.

Hyatt chose an Interactions Intelligent Virtual Assistant, an automated conversational solution. With this new platform, Hyatt transferred callers to the Virtual Assistant for automated reservation confirmation and a post-call survey. The Assistant automated frequent routine calls to confirm or cancel a reservation. It also automated the entire process of collecting guest feedback by phone upon checkout and automated portions of a new reservation process, including collection of routine guest information. According to an Interactions note, Hyatt was seeing a year-over-year return on investment of more than 125%. On average, Hyatt costs were reduced by 33%. Hyatt saved 94% on fully automated interactions, such as frequent reservation confirmation calls. By focusing more on complex, high-value tasks, reservation associates improved their sales efficiency and were reporting higher job satisfaction. (Interactions is still in the IVR business.)

In July 2018, Diversified Consultants, Inc (DCI), a telecom collection agency, announced it was deploying Interactions’ Intelligent Virtual Assistants in its payment collection operations. DCI turned to Interactions to provide a convenient and polite approach to garnering overdue payments and shepherding long-term relationships with customers. The implication was that the automated system was more consistent than agents for this sensitive operation.

Given the ability of IVR systems with speech recognition so long ago, one would expect easy-to-use systems would predominate today as computer costs dropped by a factor of over 9,000 since 1999. And, indeed, there are customer service systems that greet callers with an open-ended prompt like “How may I help you?” The result can get you a quick answer or to the appropriate specialized agent.

Perhaps surprisingly, however, as I am sure the reader can testify, many customer service lines still use touch-tone menus or speech recognition that asks a long series of questions before providing an answer or transferring to an agent.

Why? In part, web sites are cheaper alternatives, and many customer service lines begin by suggesting you go to their web sites for answers. Companies may use text-based chatbots on those web sites. The text-based solution is cheaper, and human agents can often interact with several customers at once by text when required.

Digital assistants

A growing trend using speech recognition is the use of company “digital assistants” that go beyond customer service to establish a connection with customers.

For example, Bank of America has a digital assistant called Erica. The company announced in 2024 that Erica had responded to 800 million inquiries from over 42 million clients and provided personalized insights and guidance over 1.2 billion times. The company said that more than 98% of clients got answers they needed from Erica within 44 seconds. A customer can type as well as talk to Erica in a mobile app, allowing more privacy and politeness when the assistant is addressed in public. Samsung’s Bixby digital assistant, available on their smartphones also supports both talking and typing.

The talk-or-type option is a current trend. One can newly talk or type to Apple’s Siri on Apple smartphones and to Google Assistant on Android smartphones, although neither company has emphasized the option. This is a critical trend, since only being able to talk to a digital assistant often means it isn’t available in a public setting, while the talk-or-type option means it is always available. The amount of time individuals spend texting in public highlights the importance of the typing option. Being always available is a critical change that will drive the increased use of digital assistants. With the improvement in answering questions provided by generative AI like ChatGPT, the digital assistant even becomes an alternative to web search.

And the steady increase in computer power available will allow the speech recognition accuracy and particularly the understanding of text or spoken human language to grow exponentially. It will also allow a digital assistant to better maintain a dialog rather than simply answer a question. It will allow an assistant to become more personalized. The assistant can also be proactive, e.g., reminding the user of a scheduled meeting. The ultra-powerful computer systems being built to drive ever larger neural networks also can also be used for ultra-intelligent digital assistants and the talk-or-text model will allow it to be always available on a mobile device.

The long term

There are certainly downsides to this trend. An authoritarian government can use it to monitor its citizens and control what information they receive. There is a danger of our spending more screen time and less direct face-to-face time.

Whatever the downsides, the reality is that the power of the new generation of digital assistants will almost require their use. The Internet certainly has its downsides (e.g., disinformation), but there is little prospect of its momentum continuing. The educational potential of assistants tailored for children will likely make them companions at an early age.

The major impact of speech recognition on the user interface has been predicted for decades, but the talk-or-text makes the assistant more than an occasional partner. The next generation of digital assistants could almost be considered an essential tool that augments our intelligence and becomes almost part of being human.

William Meisel recently published The Lost History of “Talking to Computers”. See www.speechrecognitionhistory.com.