Conversational Search with Generative AI

Andrew R. Freed
7 min readJun 14, 2023

--

Dictionary page
Large Language Models open a lot of possibilities! Photo by Joshua Hoehne on Unsplash.

One of the most exciting patterns enabled by Generative AI and Large Language Models (LLMs) is Conversational Search. In this post I’ll share my thoughts on three key points:

· Why would you want it & what is it

· How does it work

· What are the limitations & variations from this pattern

Why would you want it & what is it

Do you want to be able to answer questions from knowledge base you control? Do you have documents that answer questions in your domain? Is it impossible to enumerate all the questions you want to be able to answer from your knowledge base? If you said yes to any of these questions, Conversational Search may be right for you.

Simply, Conversational Search enables arbitrary questions and answers over your documents. It uses your documents to find content relevant to a given question, and it uses large language models to generate text from that relevant content. Neither the question nor the generated answer exist verbatim in your knowledge base. The generated answer is still grounded in your knowledge base. Users ask questions conversationally, with the answers grounded in a knowledge base search, hence we call this Conversational Search.

Some people call this approach Retrieval Augmented Generation. The system does the following: Given this context (retrieved from a search query), answer this question (augment the search result with generated text).

This is an improvement over traditional search which gives a passage, not an answer. For example, I trained a Conversational Search system over articles written by myself and Marco Noel, on the topic of speech customization. Here’s an example in Figure 1:

Answers to “What is a custom word” from Traditional Search and Conversational Search.
Figure 1 Comparing responses to the same question in Traditional Search and Conversational Search.

The traditional search returned a passage from a document. The passage is a keyword match from a document that is very relevant to the question. In traditional search the user could click into the document and eventually find the answer. In conversational search the system returns an answer directly. Conversational search’s answer is fantastic! It is sourced from the knowledge base, and Figure 2 below from a developer tool shows what documents were used to construct the answer. The user asking the question can also get a link to the document(s) to verify the answer.

Developer tool showing the source of every part of the answer to the question “what is a custom word?”
Figure 2 Example conversational search response, highlighting which documents were used for every word in the answer.

LLMs are known to hallucinate by producing confident-but-unjustified answers. Many LLMs are “trained on the Internet”. If you took an average snippet from the Internet, would you trust it? However Conversational Search can mitigate hallucinations by first requiring that a significant portion of the answer is sourced directly from the knowledge base, and secondly responding with “I don’t know” if the knowledge base cannot be used. Other mitigations are possible and may be required depending on your LLM. As shown in the previous image, conversational search can provide evidence that helps you decide whether to trust its answers.

Using generative AI over your document collection is easier than scripting an FAQ or a static chatbot over the questions you can enumerate. It generates answers, not passages, so it is more useful for your users than search. Finally, it can give provenance for the generated answers, giving you a trustable response.

How does it work

In a traditional search the flow is “user query” -> “knowledge base search” -> “return relevant passages”.

In conversational search the flow is “user query” -> “knowledge base search plus LLM” -> “return answer (with evidence)”.

We’ve already seen how pure “knowledge base search” works. Pure LLM is when you send your query to an interface like ChatGPT, which answers questions based on its own knowledge base (it’s training data) — not your knowledge base. The magic happens in the combination of “knowledge base search plus LLM”. The key steps are shown in the Figure 3 below. Let’s assume the user asks a question “How can I do X?”

A user query goes to a search repository for context, the context is fed to the LLM, which uses it to generate an answer.
Figure 3 High-level conversational search flow

Conversational search starts with a traditional search, but feeds search results into a large language model. The LLM uses a cleverly engineered prompt that includes context from the knowledge base. This prompt instructs the LLM to generate an answer but to constrain the answer to the knowledge base. The prompt shown is illustrative — your chat platform provider may add additional instructions such as telling the LLM when to say “I don’t know”.

The beauty of conversational search is that most of the work falls on the chat platform provider.

The chat solution builder only needs to:

· Provide relevant documents (or a link to the knowledge base API).

· Configure a connection to the LLM.

The chat platform provider does much more heavy lifting:

· Hosts and manages the LLM.

· Prompt engineering to build the most performant prompt. LLMs have strict limits on prompt size leading to tradeoffs. How many search results should be included? How detailed do the other prompt instructions need to be?

· Possibly tuning the knowledge base API calls in concert with the prompt engineering to retrieve relevant and concise packages.

I set up a demo conversational search for myself in less than an hour, which included building my knowledge base!

In Figure 3 we saw how conversational search answers a user’s initial question. This pattern can also work in a back-and-forth conversation where the user asks follow-up questions. Follow-up questions are demonstrated in Figure 4.

Figure 4 Conversational search using previous chat history

Again, the conversational platform does most of the work.

Conversational search is exciting because every enterprise has a knowledge base and conversational search makes it easy to unlock that knowledge base! I expect that conversational search will be the new baseline expectation for knowledge bases within a few years. Users will love the answers supported by evidence; knowledge base owners will love the increased value derived from their content.

What are the limitations & variations from this pattern

Conversational search is still in its infancy. There are a few challenges to wide-scale implementation and there are several variations on the basic theme.

Challenge #1: It’s expensive. The large language models behind conversational search are computationally expensive, run on specialized hardware, and require specialized skills to tune.

The large AI players host most of the LLMs and I expect most enterprise to buy vs build. API calls to LLMs are generally much more expensive than most knowledge base search APIs.

One variation is to use conversational search as a fallback option. You build a traditional/static Q&A chatbot for your most common questions (the “short tail” or the “80” in “80/20”) and fallback to conversational search for any questions not pre-programmed (the “long tail”). You can think of the static Q&A as a cache. Within this pattern, you can use the LLM to generate the answers for the common questions once and store them in the static chatbot.

Challenge #2: Who hosts the model?

As stated above the models are expensive. Many model providers offset this cost by keeping the requests sent to their model for use as future training data. Is your enterprise ok with sending their questions and knowledge base content to a third party? I expect enterprises to demand options to use an LLM that still protect their confidential data.

Challenge #3: This pattern is Q&A only, not transactional (yet)

Conversational Search can generate excellent answers to user questions. I’ve seen great answers to “How do I open an account”, “How do I reset my password”, and more. Wouldn’t it be great to just open the account, or reset the password, for the user? This is such an obvious next step I think we will soon see it coming from the platform providers. Enterprises will want a solution that can integrate with their APIs to do task completion.

Variation #1: Search APIs

The fundamental conversational search architecture can use almost any search API. Traditional search APIs which receive a text query and return documents and/or passages are ubiquitous. Conversational Search is at the mercy of the search API to return highly relevant content to feed into the LLM prompt. LLM prompt space is at a premium and a small improvement in search relevancy can lead to better answers. One variation is to use vector embeddings rather than the traditional “upside-down index” of search APIs.

Vector embeddings encode semantic relationships between the words, leading to a deeper understanding of both the user’s question as well as the documents in the knowledge base. Traditional search methods are closer to keyword-based. My traditional search in Figure 1 returned a result that was keyword-relevant but not enough to answer the question. Existing search APIs are easy; vector embeddings can give better answers.

Variation #2: Pass more context to the LLM

The example LLM prompts in this post have been generic and applicable to any conversation. The prompts could be further augmented with additional context such as a user profile. The LLM could be instructed “the user is a Gold member”, “the user is an adult”, or “the user is located in <X>”. I expect chat platform providers to provide a way to do this, though it will require some work to make the integration easy for solution builders.

Figure 5 Imagine an LLM personalized for each of these users. Photo by Jacek Dylag on Unsplash.

Conclusion

Conversational Search is an exciting pattern that is easy to set up and has lots of promise. I hope this post has been useful to your understanding of the pattern. Writing this post helped me solidify mine!

--

--

Andrew R. Freed

Technical lead in IBM Watson. Author: Conversational AI (manning.com, 2021). All views are only my own.