In Data We Trust: Simulate, Design and Measure Business Bots

In our previous blog on insights into our data collection initiative, we shared our understanding of the complexities of real-life interactions and how they differ from bot-human interactions. Now that we understand the complexities, the next step is to use this knowledge to build effective business bots. However, it’s more than simply understanding the process. We also need to understand which metrics are important when defining what is success and what are basic improvements.

Simulate Business Scenarios

If we separate out personal assistant bots and business bots, almost every bot used by businesses has a specific purpose. Those purposes can range from reducing process bottlenecks to assisting a human agent in their customer interactions. Furthermore, it’s also important to be able to differentiate between lines of businesses, business functions and processes. As a business incorporates more use cases into its bots, the differentiation between use cases can push the bot to its ultimate test.

For Genesys customers and partners, at the start of a bot creation exercise, the industry is a known entity. Therefore, to simulate such a case, we choose an industry that has been an early adopter of emerging technologies: banking.

Design the Business Bot

We used use cases across different, yet common, categories in banking. These include bill payment, currency exchange, account login issues, applications for accounts and loans, queries regarding transactions, and branch banking. In each category, we looked at use cases that covered a wide breadth of banking functions, irrespective of the repercussions of language-related confusability for the underlying natural language processing (NLP) technology. The end-result was 16 use cases spread across six categories. In subsequent blogs in this series, we’ll talk about bot authoring, handling intent confusability, empathy and more.

With 1,600 different scenarios covering the 16 use cases, we were ready to get started.

Going Into the Field

Conducting the data collection exercise correctly is almost as important as the specifics of the exercise we just discussed. Here are a few additional considerations.

  1. Time

This is the maximum time the participant can give to the experiment without getting bogged down or dis-interested in the exercise. For our experiment, our estimate was two minutes per scenario (voice or chat). The threshold we set was 20 minutes per user. Therefore, we decided to present 10 scenarios per user — with two repeat scenarios to check user behavior with increased familiarity of the bot.

  1. Communication

Participation in an experiment is a voluntary exercise. It’s important to acknowledge and appreciate the effort that a participant puts into the experiment as well as how valuable it is to you. Also, let’s face it: Interacting with a bot that you didn’t design is fun, and the communication to the participant should reflect that.

  1. A/B testing and randomization

With 1,600 scenarios spread across 16 use cases and six use case categories, as well as a requirement to present eight unique and two repeated scenarios per participant, it’s important to determine the logic for the distribution of use cases for a participant. In our case, we ensured that all six use case categories were covered in the eight scenarios presented to the participant, and that there’s room in the logic for A/B testing for subsequent phases

  1. Phased approach to data collection

Data collection exercises at this scale must be done in phases. This provides room to make design changes during the process, based on feedback from test subjects and partial analysis of the metrics of interest. While a detailed plan for every future phase might not be necessary when starting out, it’s also important to note that the user experience is consistent across phases.

Measure and Introspect

Having a data-collection exercise, especially in phases, warrants a definition of success metrics. The metrics need to be defined — keeping in mind the different aspects of the exercise.

  • The first set of metrics define the success of the exercise itself, including the number of participants of the exercise, scenarios represented in the data set and so on. If the exercise progresses in phases, it’s also important to understand these metrics as we progress through the phases.
  • The second set of metrics are on bot performance. These are related to how well the NLP technology performs in recognizing intents and entities.
  • The third set is that of user experience. You should understand: What is the user feedback sentiment? Is the user taking less time with additional phases and improvements? Are the number of turns taken to achieve a task increasing or decreasing with each interaction? 

In subsequent blogs in this series, we’ll discuss these metrics in detail as well as the collection of metrics, modeling bots and our learnings. Watch for more.

 Catch up with the previous blogs in this series: