Analyzing and Improving a Watson Assistant Solution Part 1: Analytics Personas and Existing Solutions

Andrew R. Freed
IBM Data Science in Practice
4 min readMar 9, 2020

--

Training for peak performance! Photo by Jonathan Borba on Unsplash

Your Watson Assistant solution is in production. Congratulations! Now it’s time to analyze the solution’s performance and implement improvements. This blog post will help you get the most out of your virtual assistant. I will cover the various personas interested in analysis, the types of analyses that help them, and how to develop these analyses.

Key personas

Analytics means different things to different people. For text-based virtual assistants there are three primary personas, each with a different goal:

Executive: This persona needs metrics that ties back to key performance indicators (KPIs). These metrics are generally extracted by summarizing entire conversations and comparing groups of conversations. An executive might ask “how many conversations does the assistant receive and what percentage end in escalation?”

Intent Analyst: This persona trains the Watson Assistant classifier to identify intent from user utterances. This analyst is concerned with determining the runtime performance of the classifier, identifying low-performing intents, and improving the intent training data. They are also concerned with identifying user statements the assistant does not understand and handling those with improved intents and/or entities.

Voice assistants include a third analyst:

Speech Analyst: This persona trains Watson Speech to Text models to convert user audio to text. They are interested in determining the runtime performance of their speech to text models, identifying speech transcription errors, and improving the speech to text models.

These personas will work together and often look at similar data. For instance a dialog entry generating unexpected responses may require the speech and intent analysts to collaborate on a remediation.

Key analysis types

Virtual assistant analysis can be broken down by the scope of data that is analyzed.

Conversation analysis uses a wide view of an entire conversation. A conversation is a group of consecutive messages.

Message analysis looks at a single request-response pair, the request coming from the user and the response from the system.

When doing message analysis you need to consider the context of the request-response. Specific messages will be interesting to different audiences. For instance, when the dialog asks a targeted question like “What is your member number?” the response will generally not include an intent. If a conversational flow has many messages, the intent analysis may only be interested in the first (where intent is typically gathered).

Summary of pre-built resources for Analyzing Watson Assistant solutions

A variety of resources exist for analyzing Watson Assistant solutions, each with a particular focus.

The Measure Notebook defines two metrics, effectiveness and coverage:

· Effectiveness is a conversation-focused metric for business analysts. Configure the notebook with how to detect an ineffective conversation by specifying conditions (ie, is an #opt-out intent seen or does the dialog visit an “Escalate to Agent” node).

· Coverage is a message-focused metric for intent analysts. Configure the notebook with how to detect statements the assistant didn’t understand by specifying conditions (ie, low intent confidence or does the dialog visit an “I didn’t understand” node).

This notebook can plot these metrics in aggregate as well as their trends over time. The notebook is self-contained as it can both gather Watson Assistant logs and analyze them.

The Effectiveness Notebook generates blind test accuracy metrics from new ground truth that you must provide. This ground truth includes user utterances and the expected intents and entities that Watson Assistant should find. It generates summary statistics useful for the intent analyst.

The Dialog Skill Analysis provides two analyses for the intent analyst.

· Training data analysis is a pre-deployment analysis you can run on your intent training data. It identifies patterns Watson Assistant will learn from your training data and identifies potential sources of confusion or error via several analyses including a chi-squared test.

· Test set analysis runs a blind test by taking new ground truth data you create (ideally with utterances extracted from runtime logs). It generates accuracy metrics including F1 score and makes suggestions on how to revise the training data for improved performance at runtime.

The WA-Testing-Tool has several analytic modes:

· k-folds test is a pre-deployment analysis for your intent training data. It identifies potential intent confusions by repeatedly removing a subset of the training data and treating it as blind data. Metrics include F1 score per intent and a confusion matrix.

· Blind test takes new ground truth data that you create (ideally with utterances extracted from runtime logs) and generates F1 scores and a confusion matrix.

· These modes are offered in Python source code and a Jupyter notebook.

More on ground truth

Several of the tools mentioned run blind tests on ground truth data you create however none of these generate ground truth automatically. This is because ground truth data requires SME expertise to create. The generalized ground truth creation process for a blind test is creating a two-column spreadsheet by:

· Column 1: Gather user utterances from your deployed system.

· Column 2: Have SMEs “label” the utterances by identifying the intent in the utterance. (Note that you can “bootstrap” column 2 by pre-populating it with the suggested intent by your Watson Assistant)

Later in this blog series I will demonstrate a recipe for gathering user utterances suitable for a blind test. The remaining posts discuss how to read and interpret Watson Assistant logs, and some common analytics you may wish to run on them.

Thanks to the following reviewers of this post: Eric Wayne, Aishwarya Hariharan, Audrey Holloman, Mohammad Gorji-Sefidmazgi, and Daniel Zyska.

For more help in analyzing and improving your Watson Assistant reach out to IBM Data and AI Expert Labs and Learning.

--

--

Andrew R. Freed
IBM Data Science in Practice

Technical lead in IBM Watson. Author: Conversational AI (manning.com, 2021). All views are only my own.