Quickly improve your voice agent with a speech model

Andrew R. Freed
IBM Data Science in Practice
5 min readApr 9, 2020

--

Take your Voice Agent to the next level with this guide! Photo by Thomas Le on Unsplash

Co-authored with Marco Noel.

You can very quickly create a voice solution using Watson Assistant and Voice Agent. Out of the box the Voice Agent will use a standard speech to text model. With a small additional effort you can train a custom speech model that will understand your domain even better. This guide will help you bootstrap that speech model using your existing Watson Assistant.

Step 1: Export your Watson Assistant skill

From the Watson Assistant editor, navigate to the Export option to download a JSON copy of your skill. Save this JSON file to your hard drive.

Exporting a Watson Assistant skill

Step 2: Prepare scripts to read the skill JSON file

Create the following two script files on your hard drive. These scripts will read the training data from your Watson Assistant intents and entities and create 10 speech training examples for each.

Script one: extractJSONIntents.py

#Save this to extractJSONIntents.py
import json
import sys
counter = 10
count = 1

file = sys.argv[1]
with open(file, "r") as read_file:
data = json.load(read_file)

for list_intents in data['intents']:
for list_utterances in list_intents['examples']:
while count <= counter:
print(list_utterances['text'])
count = count + 1
count = 1

Script 2: extractJSONEntities.py

#Save to extractJSONEntities.py
import json
import sys
counter = 10
count = 1

file = sys.argv[1]
with open(file, "r") as read_file:
data = json.load(read_file)

for list_entities in data['entities']:
for entity_name in list_entities['values']:
while count <= counter:
if 'synonyms' in entity_name:
print (entity_name['value'])
for idx,item in enumerate(entity_name['synonyms']):
print(item)
else:
print (entity_name['value'])
count = count + 1
count = 1

Step 3: Create the speech language model corpus files

Execute the scripts from the previous step and save the results to corpus files.

python3 extractJSONIntents.py path_to_your_workspace.json > intent_lm.txt
python3 extractJSONEntities.py path_to_your_workspace.json > entity_lm.txt

(After creating these files you may wish to edit them further using a text editor to add or remove certain examples.)

Step 4: Create a language model

For this step you need the apikey and service URL of your Speech to Text instance. They are available on the Manage page from your Speech to Text service. You also need to know the name of your base model. For most US-English telephony projects this is en-US_ShortForm_NarrowbandModel. Telephony projects in other languages should use the appropriate NarrowbandModel for their language.

Modify the following command, search and replace the {apikey} and {url} with your values. Additionally replace the {name} and {description} with something useful. Be sure to update the base_model_name as appropriate.

curl -X POST -u "apikey:{apikey}" --header "Content-Type: application/json" --data "{\"name\": \"{name}\", \"base_model_name\": \"en-US_ShortForm_NarrowbandModel\", \"description\": \"{description}\"}" "{url}/v1/customizations"

The response will be something like:

{"customization_id": "0c257322-a5b4-470a-175b-d8bd123436cf"}

Please note you only need to do this once. Be sure also to configure the Voice Agent to use this custom model. The custom language model configuration is in the “advanced” section.

Configuring Voice Agent with a custom language model

Step 5: Train the language model

Any time you make a change to the speech corpus files you will need to update the language model. You will typically update the corpus files as you add examples to the Watson Assistant or as you want to improve statements that don’t transcribe well. Based on which file you updated, run the appropriate corpus update line below. Note that allow_overwrite=true is used so you can use the same command on your first, tenth, or hundredth update.

Update all of these commands with your {customization_id} created above. You also need to update these commands with the same {apikey} and {url}.

curl -X POST -u "apikey:{apikey}" --data-binary @intent_lm.txt "{url}/v1/customizations/{customization_id}/corpora/intent_lm?allow_overwrite=true"
curl -X POST -u "apikey:{apikey}" --data-binary @entity_lm.txt "{url}/v1/customizations/{customization_id}/corpora/entity_lm?allow_overwrite=true"

Note: Wait a few seconds between commands, to let Speech to Text process the files.

After you update a corpus file you need to train the model for it to take effect.

curl -X POST -u "apikey:{apikey}" "{url}/v1/customizations/{customization_id}/train"

The model trains very quickly (matter of seconds). You can check the model status, when you see available the training is done.

curl -X GET -u "apikey:{apikey}" "{url}/v1/customizations/{customization_id}"

After you customize these template commands with your apikey, url, and customization_id, save the commands into a file for later reuse.

Step 6: Use custom words to force certain transcriptions

Speech to Text “custom words” are like a forced search-and-replace run on the speech to text transcription. If you find domain-specific words that do not transcribe well you can force the transcription you want.

For instance, at runtime the speech engine may transcribe “What is my V. P. N. I. D.?” but you would rather receive “What is my VPN ID?” This is a great job for custom words.

Create a file words.json with content like the following:

{
"words": [
{"word": "RSA", "sounds_like": ["R. S. A.", "R. S. a"], "display_as": "RSA"},
{"word": "VPN", "sounds_like": ["V. P. N."], "display_as": "VPN"},
{"word": "ID", "sounds_like": ["I. D.", "eye D."], "display_as": "ID"}
]
}

Load these custom words into Speech to Text with this command template:

curl -X POST -u "apikey:{apikey}" --header "Content-Type: application/json" --data-binary @words.json "{url}/v1/customizations/{customization_id}/words"

Train the model again with the same command template:

curl -X POST -u "apikey:{apikey}" "{url}/v1/customizations/{customization_id}/train"

Step 7: Iterate!

As your assistant evolves you can iterate with these steps. The language model only needs to be configured once but you can add new examples from your Watson Assistant skill as many times as you like. The power is yours!

Thanks to Marco Noel for the scripts and co-authoring this article.

For more help in improving your Voice Agent, Watson Assistant, or Speech to Text reach out to IBM Data and AI Expert Labs and Learning.

--

--

Andrew R. Freed
IBM Data Science in Practice

Technical lead in IBM Watson. Author: Conversational AI (manning.com, 2021). All views are only my own.