Chatbots and Watson Assistant: Understanding The Moving Parts
Over the past few months I’ve been digging more and more into Watson Assistant, formerly Watson Conversation. I’ve also been active in the Slack channels around Watson and Watson Assistant. I’ve become very conscious of a continuing theme around the product, based on a fundamental misunderstanding. It actually reminds me of some sessions I did for XPages some years ago, because the key is the same: referring to “something” when in actual fact it’s “a bunch of things”. And understanding which of those bunch of things your query refers to is critical to getting a solution and, indeed, architecting your application.
In this case, the term “chatbot” has become too closely aligned to Watson Assistant. What’s further complicated the understanding is when you need to integrate external data into the chatbot. The critical point is understanding what Watson Assistant is.
What Is Watson Assistant?
Watson Assistant is a serverless architecture that uses natural language processing and more structured processing to build a dialog workflow that an external system can interact with and tracking the session for a specific conversation, fronted by a browser-based tool with a basic testing client baked in. Let’s break down the key parts there:
- Serverless means you don’t install it somewhere, it runs as a service, scaling accordingly. There is obviously a database behind it where your questions, intents, entities and contexts get stored, along with the more complex variants of phrases etc populated by the training. But you don’t know or care what technology is used, where it is, how it works and you can’t drill down into it.
- Natural language processing is based around the intents – the phrases and variants you define. Those phrases could have a vagueness about them. Case sensitivity doesn’t matter, whether or not you include “the” or “a” won’t matter, grammar won’t matter. The natural language processing engine will handle that. The phrases will also include entities, see below, but those will get abstracted from the phrase. So if you have a phrase referencing “database”, if you have an entity for that with variants for “application”, “SQL”, “NoSQL”, “Domino”, “DB2”, “MySQL” etc etc, the Watson Assistant engine during the training will cross-reference the two for you.
- More Structured Processing means the entities. These are key values, with or without fuzzy matching, aggregated by synonyms or patterns, using regex. There may be specific terms in a phrase for which you want to add alternatives, or they can be hard-coded lists of options, or ways of recognising a unique reference. You define the list specifically here, so it’s not relevant for customer names in an external database, for example. But it is relevant for a customer number that’s always five digits, for example, or a non-US date format or email address.
- Dialog flow means you’re building workflow rules to respond to text passed to Watson Assistant. Within that workflow you will also define whether people can jump out of the flow, jump back in, you may require specific responses, all in an ordered, top-to-bottom flow. Each flow step can manipulate the context and can provide output, with rules of what to do next. If there is output, it will include textual variants, with rules of how to apply those. It may output additional JSON objects which external systems may use.
- External systems will be able to interact with the flow. You do not define which systems. Anything that has the relevant credentials can do so. It can be a system with or without an actual user interface. Because regardless of whether the developer or developers provide a user interface, those systems interact via REST service calls.
- Tracking the session is done via the context JSON object, which can be passed into and is passed out of every interaction with Watson Assistant.
- The browser based tool is the IBM Cloud interface you use to create the intents, entities, and dialog flow. You can export your “workspace” from and import it into Watson Assistant, as well as programmatically populating content.
- Basic testing client is the sidebar as you’re creating the dialog flow that allows you to test it. It’s basic because it receives displays just text. It’s for testing what nodes of the flow it goes to, how it gets there, and what context variables get changed. It’s not for testing the outputted text, nor should it be used for that.
What Is A Chatbot?
A chatbot comprises a user interface backed by an application where a person provides dialog content which is passed with any metadata on through one or more services and receives a response for the application to return for the person. Again, let’s break that down:
- The chatbot comprises a user interface a person interacts with. Where that is and, even more importantly, how the dialog content is provided will vary. What I mean is that the dialog content could be provided via typing, speaking or some other method of passing unstructured, un-moderated content. Watson Workspace is one interface. A simple text box is another. Speaking to an interface that can convert speech to text is another. And there are probably other methods of interface not yet anticipated by developers and which will come in the future.
- It is backed by an application. The interface may be embedded in another application using off-the-shelf functionality, as in the WordPress Watson Assistant plugin. In this case, the potential for wrapping other services, like custom display conversion or database lookup, will be limited. If advanced functionality is required this may be server-based (Java or node.js application) or server-less (e.g. using IBM Cloud App Connect) to receive and modify output. This “orchestration layer” will be responsible for calling all services as well as converting the raw output of each, as required, for the next service. (Thank you Mitchell Mason for naming this part of the chatbot for me.)
- A person provides dialog content which is passed with any metadata, namely the context, which may have been updated before the dialog content is provided by the user or afterwards.
- Through one or more services one of which will be Watson Assistant. This could be passed directly via a few lines of code to Watson Assistant. But Watson Assistant will not necessarily be the only service. If the chatbot interface receives speech, obviously it will need passing through a speech-to-text service before being passed to the dialog flow. If database integration is required, some kind of custom service will be required to extract content and perform the database query. As I said, this will be done by the orchestration layer. An example may be using Node-RED, for example, to perform additional database queries and modify context variables, updating Watson Assistant as required.
- Receives a response for the application to return for the person. Again and most importantly, the way it uses the response will vary depending on the chatbot interface and more. If the chatbot speaks, it will need to return speech to the user. If it is a command line interface, it will need plain text. If it is a web page, it may need to return HTML. Consequently, this is again managed by the orchestration layer, which will need to manipulate the output from relevant services for subsequent services and, potentially, for the user interface.
How it passes back options will also vary. For speech or command line, it may expect a number (think of automated processing when you call a call centre). If it’s a web page, it makes more sense to display buttons and different chatbots will expect button content in different ways. A WordPress chatbot expects buttons in a specific format. A custom web page depends on how that web page is coded, it could just expect the buttons as HTML. What they then need to do when the button is clicked will also vary, depending on the specific chatbot. The WordPress plugin just passes the button text directly to the input field and submits it. That could result in long button text, which a custom implementation could avoid.
And obviously how links to websites will vary considerably, depending on the response medium.
This also assumes whatever is passed from Watson Assistant is ready to pass on. The chatbot implementation may not even need to receive text, it may take differing actions depending on the value passed to a context variable. It may need to use a context variable to query a database and change the response depending on whether content can be found in the database or not. It may need to modify the context before then passing a message back to the user.
Putting The Pieces Together
So the routing here is person (1) to chatbot (2) through Watson Assistant (3) back to chatbot (4) to person (5). There may be many more steps between 2 and 3, or 3 and 4. After 4 it may have many more steps before it reaches 5. Indeed it may need to go back to 3 again, passing pre-arranged text to jump to a specific part of the dialog flow.
Consequently, asking “how do I display a button with Watson Assistant” or “how do I query a database with Watson Assistant” is not something to ask of Watson Assistant – or any other dialog flow service. As I have said, to display choices, what you need to pass from Watson Assistant will vary depending on where you’re surfacing the chatbot. And the programming around the chatbot is what will need to query the database, and that will vary depending on the implementation. It may be the chatbot application itself, it may be another workflow tool like Node-RED or it may be another serverless program like AppConnect in IBM Cloud.
Hopefully what’s also become apparent is that the context object in Watson Assistant dialog flow can only be changed within Watson Assistant based on the step of the dialog flow, recognition of intents or regex parsing of the text. If it’s dependent on anything else, like querying a database, the context needs to be updated outside Watson Assistant and must be passed back so Watson Assistant can use it when the next dialog content is passed to it. And if the value to be passed to the database query isn’t everything the person provided in that dialog content, it needs to be extracted from the dialog content provided by the user based on some regex query or other developer-driven logic. And this may require the chatbot asking the user for confirmation.
And obviously there is a massive gulf between the effort required for a simple “hello world” command-line chatbot that only returns a small number of fixed responses and a talking chatbot that handles hundreds of scenarios, interacts with databases that are not easily accessible, needs to handle multiple languages and vary the tone of its response.