Predicting the next 5 minutes of a Cricket Game 🏏 Project Monty

Drew Jarrett
6 min readAug 23, 2019

I’m Drew — for those of you who don’t know me — hi.

I’ve been working at Google for sometime now. My current role is as a Customer Solutions Engineer, helping our Advertisers get the best out of their websites, apps, and data. All from my desk in beautiful Sydney.

In this post we’re going to have some fun 👍 and predict what the dismissal will be in the next 5 minutes of a live Cricket Game. Specifically, a proof of concept I built for Fox Sports and Mindshare here in Australia.

Keen to hear the technical detail? Read on. I’m using the Google Cloud Platform technologies: Dataflow, BigQuery, AutoML Tables, and App Engine.

Hang on…. Why did we do this?!

Cricket in Australia is popular and receives a lot of attention during the summer season. Fox Sports (who broadcast the game) and Mindshare (their partner agency) approached us with the challenge of making it more fun and engaging for their users.

The initial brief was to simply increase views / tune ins to watch the Cricket. After some brainstorming we decided to focus on those “Magic Moments” during a game, and ways to let people know they are happening.

Once we had our hands on the data it became apparent the “Magic Moment” would be a dismissal. Then we got to thinking… wouldn’t it be cool if we could predict when a dismissal would happen. Imagine knowing if the current batter is about to be bowled out, caught out, run out…etc. This could be used to inform media decisions and Fox Cricket app notifications to let people know it’s time to tune in.

The results were pretty cool, including some fun social activity.

The first correct prediction called Monty Panesar being caught out in the next 5–10 minutes, so of course the ML model was given the nickname “Monty”!

Check out these videos to get a quick overview of the project (before we dive into the technical bits).

Project Flux Capacitor (Using Google Cloud Dataflow and BigQuery)

We had 1 years worth of Cricket data to play with (the 2018 version of the games we are looking to predict in 2019). There was a mix of different game types. The data was ball by ball, and each ball had some interesting variables surrounding it i.e. what the batter, bowler, fielders and even the ball were doing each time the ball was bowled. The data was provided as CSV which was put into Google Cloud BigQuery pretty easily.

The different game types would be abit problematic later on in the process due to the variety of formats — oh I wish I knew more about Cricket — so we mainly just focused on Test Matches being the more likely to predict given the length of play.

To make sense of the data for propensity modelling I applied a windowing strategy. Windowing the data (a Cricket game ) into 1 minute intervals. At each point in time (every minute in the game) we knew the future dismissal based on a prediction window of 5 minutes into the future, and what has happened in the past based on a historical window of 1 hour containing aggregated ball features.

I then had data ready to train an ML model. Structured in a way so that at every minute (from the start to end of a game) the data tells the model what will happen 5 minutes into the future based on what has happened in the past.

I used Google Cloud Dataflow to do this bit, it plugs into BigQuery pretty seamlessly and has a nice SlidingWindows transform to use.

Now, who can resist a fun project name, and from that “Project Flux Capacitor” was born.

To go into some more of the technical Dataflow detail here…

I setup a Dataflow “featurePipeline” pipeline that reads the data in from BigQuery and sets sliding windows of 5 minutes (our prediction window) every 1 minute (our sliding intervals).

PCollection<KV<Long, FoxtelWindowingFeatures>> featurePipeline = pipeline
.apply("ReadBigQuery", BigQueryIO.readTableRows().withTemplateCompatibility()
.apply("ExtractFeaturesFromTable", ParDo.of(new ExtractFeaturesFromTableFn()))
.apply("SetGlobalWindowsTimestampCombiner", Window.<KV<Long, FoxtelWindowingFeatures>>into(new GlobalWindows()).withTimestampCombiner(TimestampCombiner.EARLIEST))
.apply("Group", GroupByKey.<Long, FoxtelWindowingFeatures>create())
.apply("SeparateOutIntoPredictionWindowsFn", Window.into(SlidingWindows.of(Duration.standardMinutes(5))
.every(Duration.standardMinutes(1))))

Each ball by ball information is stored in an object. Now we have the sliding windows set we can update the object with the prediction window results instead of the actual results for every minute. I named this pipeline “featurePredictionPipeline”.

PCollection<KV<Long, Iterable<FoxtelWindowingFeatures>>> featurePredictionPipeline = featurePipeline
.apply("PredictionWindowFn", Combine.perKey(new PredictionWindowFn()))
.apply("SetGlobalWindowsTimestampCombiner", Window.<KV<Long, FoxtelWindowingFeatures>>into(new GlobalWindows()).withTimestampCombiner(TimestampCombiner.EARLIEST))
.apply("Group", GroupByKey.<Long, FoxtelWindowingFeatures>create());
PCollectionView<Map<Long, Iterable<FoxtelWindowingFeatures>>> featurePredictionView = featurePredictionPipeline.apply(View.asMap());

At the same time we also create a “featureLookbackPipeline” pipeline to aggregate (combine) all past ball activity for every minute.

PCollection<KV<Long, Iterable<FoxtelWindowingFeatures>>> featureLookbackPipeline = featurePipeline
.apply("LookbackWindowFn", Combine.perKey(new LookbackWindowFn()))
.apply("SetGlobalWindowsTimestampCombiner",
Window.<KV<Long, FoxtelWindowingFeatures>>into(new GlobalWindows()).withTimestampCombiner(TimestampCombiner.EARLIEST))
.apply("Group", GroupByKey.<Long, FoxtelWindowingFeatures>create());

Then we can combine the two pipelines and write it back to BigQuery.

featureLookbackPipeline.apply("Combine", ParDo.of(new CombineFn(featurePredictionView)).withSideInputs(featurePredictionView))
.apply("CreateTableFn", ParDo.of(new CreateTableFn()))
.apply("Write into BigQuery", BigQueryIO.writeTableRows().to(tableRef).withSchema(theSchema).withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED).withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_TRUNCATE));

AutoML Tables

Recently (April 2019) Google released a tool called AutoML Tables. AutoML Tables is geared at building regression, classification… models from raw data. It’ll do the data analysis, feature engineering, and hyperparameter tuning for us. Given our client is new to Machine Learning / experimenting, and AutoML would do all the heavy lifting, this seemed like a great opportunity to take it for a test drive.

So AutoML Tables plus our Project Flux Capacitor features and — voila — we now have an ML model (named “Monty”!) ready to make predictions. As there are several types of dismissal (bowled out, caught out, run out…etc) AutoML Tables returns a classification Model, scoring each dismissal type.

Live predictions

The final piece of the puzzle was making live predictions during a game.

The live data was POST’ed to us as XML data. Google Cloud App Engine was perfect for this use case. I setup a few Endpoints to handle real time data POSTs and prediction queries.

The architecture looked like this.

The process went as follows…

  1. Everytime a ball was bowled during a live game the information surrounding that ball was POST’ed to an App Engine endpoint.
  2. App Engine would process the XML and — in addition to making a prediction on that ball — save the result into BigQuery. BigQuery is perfect for this, allows us to collect more and more data for re-training in the future.
  3. In order to make a prediction Dataflow was used to aggregate the latest live ball features, as per the historical window, which was then sent to AutoML Tables to make a prediction.
  4. The latest prediction made by AutoML Tables was saved to BigQuery. Why not, let’s collect the data for future analysis and see how many predictions we got right! And — more importantly — also saved into App Engine’s Memcache for fast access.
  5. When a real time query request was made to the endpoint, App Engine returned the latest prediction from memory (nice and fast).
  6. The prediction returned the various types of dismissal along with a confidence level on it happening in the next 5 minutes.

During the live Cricket games real time query requests were made from the Fox Cricket app to send push notifications, and an AdWords Script to dynamically start / stop a “Wicket Warning” ad campaign. They would trigger when a dismissal confidence was high enough.

To sum up — we can now predict the future?!

Well… not just yet. This was certainly a fun experiment, exciting times ahead, but a long way to go. Still, I can’t wait to see (how / if) Fox Sports (and other Sports broadcasters) scale this technology.

The model’s (Monty’s) accuracy came in at 87.2% accuracy (which is really good), it gets this score by accurately predicting a testing data set. However, there is so much random influence in the future games that prediction confidence level was always taken with a pinch of salt.

When the prediction was right — as you would expect — it was followed by a “Wow / Magic Moment”. When the prediction was wrong it showcased the current batter as someone beating the statistical odds when the pressure is on.

Thanks for reading,

Drew Jarrett

--

--

Drew Jarrett

Working @Google across SYD & LDN. Developer. Innovative. Problem solver. Passion for making a difference through what I do. Proud Dad of two amazing girls.