How many COVID-19 cases near me? A chatbot solution to scale one of the most asked questions.

In the peak of the CV-19 pandemic I found myself, and my loved ones, asking the same question everyday — “How many cases are there in my town or neighbourhood?”

I set myself the goal of building a solution to answer the question “How many cases are there in my town or neighbourhood?” Clue: I wanted to make it easy and scalable for my friends and family.

Google Dialogflow is a natural language understanding platform used to build conversational user interfaces, the perfect tool for making a chatbot to answer our question. In April 2020 Google released a guide for building a COVID-19 Rapid Response Virtual Agent in Dialogflow.

This post will address how we can connect Dialogflow to a database to answer those dynamic questions. That is, transforming a chatbot question into a database query. This example asks to “list COVID-19 cases for the postcode 2026”, which is for Bondi in Sydney.

Datasets

A quick bit about me — I currently live and work in Sydney, originally from London. So the data I’m interested in looking at is for Australia and the UK.

Both datasets provide a downloadable CSV which can be uploaded into BigQuery. Allowing BigQuery to be our data warehouse, providing the data for querying.

Dialogflow setup

This bit assumes you have had a quick play with Dialogflow, and have an understanding of setting up intents.

Webhook

Firstly, let’s connect our Dialogflow agent to the database. We do this by setting up a Fulfilment, specifying the webhook.

We let Dialogflow know the URL of an endpoint, or Cloud Function, to use. I’m using an App Engine endpoint.

Context

Next we need to start constructing the condition of our BigQuery WHERE statement, before any webhook request can be made.

In this case we need to know the postcode of the person asking. We’ll do this by using a Context to ask, and remember the answers to, relevant questions.

Create an entity to recognise the respective UK and Australia postcodes e.g “2026” or “TW2”. Using a “postcode” context, so that when the “geo-postcode” entity is recognised the context is remembered and then future questions can call the webhook.

  1. User: “How many cases are there near me?’’.
  2. Bot: 🤔 Hmmm, I don’t have the “postcode” context set yet so I’ll use this follow up question — “What is your postcode”.
  3. User: “2026”.
  4. Bot: 👌 Got it, that looks like a “geo-postcode” entity, I’ll remember that and start the webhook — “Looking up # cases nearby…”.

App Engine

Let’s create an App Engine endpoint. I’m using App Engine for this use case as it allows me to do a number of tasks (e.g hosting and storage) in addition to the BigQuery API, plus then opens up flexibility to call other types of databases if needed.

Note: The webhook could be set up as a Cloud Function and call BigQuery directly (for more info there check out this great blog on How to integrate Dialogflow with BigQuery). Cloud functions are serverless and scalable like App Engine.

The endpoint JSON payload needs to be in a format Dialogflow will recognise (WebhookResponse). In Java — my language of choice — I created a JSONObject in this format to return a Google Assistant response (so that later on I can ask the question using the Google Assistant)…

JSONObject jsonResp = new JSONObject();JSONObject textToSpeech = new JSONObject();
textToSpeech.put("textToSpeech", response);
JSONObject simpleResponse = new JSONObject();
simpleResponse.put("simpleResponse", textToSpeech);
JSONArray itemsArray = new JSONArray();
itemsArray.put(simpleResponse);
JSONObject items = new JSONObject();
items.put("items", itemsArray);
JSONObject google = new JSONObject();
google.put("richResponse", items);
google.put("expectUserResponse", false);
JSONObject payload = new JSONObject();
payload.put("google", google);
jsonResp.put("payload", payload);
jsonResp.put("fulfilmentText", response);

A note on timeout

(At the time of writing this) Dialogflow and Google Assistant have a 10-second timeout limit to complete a fulfilment. Now… App Engine scales to support demand, but if there are no instances running — say you’ve built a chatbot only shared with friends and family and it’s not used much — it can take a few seconds to initiate an instance to handle the request. Mix in the time to run the query, and we have potential timeout issues.

Tips…

  • Set a minimum number of instances to keep running at all times, so the application can serve the webhook request with little latency. Although, this will cost more as you’ll always have an instance running.
  • Set some default text in the Dialogflow Intent Text Response, just in case there is a timeout e.g something like “Drew never specified a min number of instances running, so there was a timeout! This request has started an instance, so it’ll be fine to retry again now.”.
  • Use App Engine’s Memcache to remember — quickly return — results already queried, and avoid asking BigQuery the same thing again. Although, remember to clear it each time new data is added.

BigQuery SQL

At this point we have a chatbot built in Dialogflow, able to run BigQuery API queries via App Engine. Collecting the postcode in Dialogflow, and passing it to App Engine in the webhook. So what does the SQL look like?

SELECT
COUNT(*) AS total,
DATE_DIFF(CURRENT_DATE(), MAX(notification_date), WEEK),
DATE_DIFF(CURRENT_DATE(), MAX(notification_date), DAY)
FROM
`project.dataset.covid_19_cases_by_notification_date_and_postcode`
WHERE postcode = [postcode]

Still reading? In that case let’s take things up a level, and take advantage of BigQuery’s Geography Functions to answer the question “Where can I get tested for COVID-19?”.

The Australia Data.NSW website has a NSW COVID-19 clinics dataset available as CSV, perfect, let’s use that. A clinic’s address has a lat/long, so we can use ST_DISTANCE to select the closest clinic to the lat/long centre point of a postcode.

SELECT
ST_DISTANCE(
(
SELECT ST_GEOGPOINT(postcodes.long, postcodes.lat)
FROM `once-multiply-me.data.australian_postcodes` AS postcodes
WHERE postcodes.postcode = [postcode]
LIMIT 1
),
ST_GEOGPOINT(clinics.Location_Lng, clinics.Location_Lat)
) AS distance,
Facility, Address, Opening_hours
FROM `once-multiply-me.data.covid_19_clinics_nsw` AS clinics
ORDER BY distance
LIMIT 1

Disclaimer: This was a personal project and unrelated to my work at Google.

Thanks for reading, let me know how you get on.

Working @Google across SYD & LDN. Developer. Innovative. Problem solver. Passion for making a difference through what I do. Proud Dad of two amazing girls.