Introduction
In 2017, while working at a media company, I was tasked with enhancing the company’s ChatBot. Initially, the bot was very basic and unrefined. When I tested it with questions like, "Why is my bill so expensive?" the responses were generic and unhelpful. The ChatBot simply replied, "Here is your bill," missing the essence of the question entirely. It became evident that the system was unable to answer customer queries in a meaningful way, undermining its effectiveness.
At that time, I was part of the Customer Experience team, where our goal was to reduce customer service costs by minimizing call center volume and encouraging customers to use chat support instead. However, with ChatBot failing to respond accurately, users were still opting to call rather than chat. The real problem wasn’t just about getting customers to use chat—it was ensuring that when they did, they received accurate, relevant responses.
However, the ChatBot’s shortcomings were causing the opposite effect—customers were still calling in because the bot was failing to meet their needs. It became clear to me that for ChatBot to succeed, it had to do more than just answer questions; it needed to provide meaningful, accurate, and contextual responses. The challenge was not just about getting customers to use the chat feature but making sure that when they did, they felt confident that their concerns would be properly addressed.
This realization became the main problem statement that hadn't been fully acknowledged: the ChatBot was only adding value if it could respond accurately to customer inquiries. Without that, the whole strategy of optimizing customer service costs through automation was flawed.
Understanding the technology and the challenge
To address the issue, I consulted with the solution architect who worked on the ChatBot to understand how the underlying AI worked. At the time, we were using machine learning and natural language processing (NLP) technologies from AWS. The architect explained that the bot was like a baby—it needed to be trained with correct information to understand what was right or wrong. This was a pivotal moment for me, as I began to realize the root of the issue—how the AI was being trained, or rather, how it wasn’t.
I discovered that while the ChatBot was utilizing NLP, it was a very rudimentary implementation. The system was processing customer input but lacked the necessary sophistication in tagging and categorizing data, which led to incorrect or incomplete responses. For example, when I asked, 'Why is my bill so expensive?' the bot detected the keyword 'bill' and mistakenly assumed I was asking, 'Where is my bill?' This simplistic approach to intent recognition meant the ChatBot struggled to distinguish between more nuanced inquiries, failing to provide relevant answers.
This conversation with the architect was a major breakthrough in my understanding of how to effectively train an AI to serve customers better. The bot required more than just raw data—it needed robust contextual training to better understand the intent behind customer queries and respond accurately. Simply processing inputs wasn’t enough; the system needed to interpret context and nuance, allowing it to handle a wider range of inquiries in a meaningful way.
The need for a different approach
After spending a month manually tagging data to train the ChatBot, it became clear to me that this approach wasn’t going to solve the problem fast enough. The real issue we were trying to address was ensuring the ChatBot could accurately answer customer inquiries, which would encourage users to transition from calling customer service to using the chat feature. The ultimate goal was to shift customer behavior, making them comfortable with chat as a reliable support option.
At that point, the ChatBot was far from perfect—it couldn’t accurately respond to a majority of questions, which was a major barrier to achieving our objective. The solution wasn’t just improving the ChatBot’s responses; we needed to make the system hybrid. This meant incorporating a live agent fallback system that would step in when the bot couldn’t handle certain queries.
Although the framework for a live agent fallback was already in place, the execution was flawed. The system was designed so that only after the ChatBot failed three times—responding with something like, "I’m sorry, I cannot understand what you’re saying"—would the query be rerouted to a live agent. This approach wasn’t sufficient, as customers were getting frustrated by the bot’s repetitive failures before they even had the chance to interact with a human.
The process was tough, and the balance between automating responses and providing human support was delicate. We needed to rethink how quickly the bot should hand off to a live agent to improve the overall customer experience.
Implementing a decision tree model
I approached my VP with a request: give me three months to figure out what this ChatBot could actually achieve. It quickly became clear that machine learning or AI wasn’t the answer—especially since there wasn’t advanced NLP or large language models like OpenAI available at the time. Moreover, the company lacked a centralized knowledge base that both the ChatBot and live agents could draw from, leaving customer service staff relying on their personal experience without proper documentation.
The solution I ultimately proposed was to abandon AI for this project and focus on creating a more structured approach. We developed a decision tree-based system for the ChatBot. Instead of allowing free-text input, which the bot struggled to handle, users were presented with predefined options. This simplified the process and helped guide them to the right solution more efficiently. The system was similar to an Interactive Voice Response (IVR) system, but adapted for chat, with more human-sounding language.
Everything was hard-coded, from the questions to the responses, and pre-populated options were provided for users. Though it wasn’t the ideal solution, this decision tree model worked. It addressed the immediate need to provide customers with quick and accurate responses, while also helping reduce reliance on phone support by offering a more reliable chat alternative.
This also reduced the number of errors and allowed the ChatBot to function more reliably. Since the decision tree captured most of the straightforward queries, any free-text input we did receive was more likely to involve complex or unique questions that were valuable for NLP training purposes. These inputs allowed us to identify gaps in the ChatBot’s understanding and the types of inquiries it struggled with the most. We could analyze these free-text interactions to extract meaningful patterns, which helped us fine-tune future iterations of the ChatBot. For example, we could identify recurring themes in customer queries that weren’t previously addressed, enabling us to improve the decision tree or create new fallback rules. Additionally, this free-text data was incredibly useful for training more sophisticated NLP models down the line.
Operational considerations and scaling
Once we completed the first version of the ChatBot revamp, we had to carefully consider the operational side of things. With more users expected to shift from phone calls to chat, we anticipated an increase in demand for live agents as the fallback option when the ChatBot couldn’t handle certain inquiries. This meant preparing the customer service team for a potential surge in live chat volume.
Previously, few customers even reached the point of speaking to a live agent through chat, mostly because they didn’t know how to access that option. Now, with a more functional ChatBot in place, we had to prepare the customer service department for this shift. We communicated with the team, letting them know to expect an increase in chat interactions and live agent requests.
Our strategy included gradually moving some agents from phone support to live chat, where one agent could manage two chat conversations simultaneously. This was more efficient than handling phone calls, where an agent could only attend to one customer at a time.
To support this transition, we launched a campaign encouraging customers to use the chat option instead of waiting on hold for phone support. We put in phrases like "If you're waiting too long on the phone, why not try our chat service?" in the IVR while they are waiting for an agent to attend to them.
This strategy aimed to gradually shift customer behavior from relying on calls to adopting chat as the primary method of communication, thereby enhancing efficiency and reducing wait times.
Long-term strategy and reflection
The outcome of our efforts was twofold:
- Transition from Call to Chat: We successfully began moving customers from phone calls to chat, which not only reduced the strain on call center resources but also cut operational costs.
- Efficient Query Handling: By structuring customer inquiries through decision trees, we minimized the need for live agent intervention, ensuring that only more complex questions were escalated.
While this approach worked, looking back, there’s no denying that the absence of more sophisticated AI tools at the time, such as large language models (LLMs), significantly limited what we could achieve. At that time, AI and NLP solutions were still in their infancy. If LLMs like GPT-3 or GPT-4 had been available, the entire process could have been streamlined and dramatically improved.
Untapped potential: The opportunity LLMs could have unleashed
If large language models had existed back then, we could have leveraged them to create a much more dynamic, intelligent, and context-aware ChatBot. Rather than painstakingly building out a decision tree with predefined responses, we could have simply connected the ChatBot to an LLM via an API. Here’s what we could have achieved:
- Delivered more nuanced and context-aware responses.
- Reduced some aspects of development time.
- Achieved greater flexibility in handling queries.
- Improved customer satisfaction through more natural-sounding conversations.
- Potentially reached some of our efficiency targets more quickly.
More contextual responses
One of the key limitations we faced was the ChatBot’s inability to understand the nuances of customer queries. For instance, when a customer asked, "Why is my bill so expensive?" the bot would only detect the keyword “bill” and respond incorrectly with, "Here is your bill." With an LLM, the bot would have been able to grasp the context and intent of the question. It could have recognized that the customer wasn’t just asking for the bill but rather questioning its amount, providing a more detailed and appropriate response.
Rapid deployment and scalability
Instead of building and tagging data manually, which took weeks of effort, we could have used an LLM to assist in generating training data and responses. While this wouldn't eliminate the need for human oversight and curation, it could have accelerated some aspects of the development process. The model's broad knowledge base could have helped us handle a wider range of queries more flexibly, though we'd still need to ensure accuracy and alignment with our specific business needs.
Improved natural language understanding (NLU)
The LLM would have given the bot superior natural language understanding, allowing it to handle free-text inputs without the need for rigid decision trees. Customers could ask questions in their own words, and the ChatBot could respond intelligently. The reliance on predefined options would have diminished, creating a more conversational and human-like interaction that would improve user satisfaction and engagement.
Potential for continuous improvement
With LLMs, we might have implemented a system where the ChatBot's responses could be more easily updated based on new information or changing needs. Unlike our static decision tree system that required manual updates, an LLM-based approach could potentially adapt more quickly to new patterns and topics, though human oversight would still be crucial to maintain accuracy and appropriateness.
More efficient knowledge integration
One of the major challenges we faced was integrating a comprehensive knowledge base. An LLM could have helped us more efficiently process and utilize our existing knowledge sources, potentially making it easier to provide relevant answers to users. However, we'd still need to carefully curate and validate the information to ensure accuracy and compliance with our policies.
Smarter live agent handoff
In our manual decision tree model, users often had to navigate several layers before reaching a live agent. An LLM-powered system could potentially recognize more quickly when it was unable to adequately answer a question, allowing for a smoother transfer to a human agent. The system could also provide a more coherent summary of the conversation for the live agent, potentially improving the overall support experience.