Optimizing a website for Australia wide Sausage Sizzles
On May 18th 2019 Australia had a federal election. To support this we developed an Interactive Map displaying the 6800+ polling places around the country with data provided by the AEC. Plus — with big thanks to the Democracy Sausage team — added Sausage Sizzle information hot off the barbie (BBQ).
Firstly, let’s cover some important detail, what is a Sausage Sizzle? It’s a community event selling BBQ’ed sausages, or Hot Dogs. In this case the event was for an election polling place, and there was alot of them (6800+ polling places)! To make you feel hungry, here is a photo of my one…
On the day of the election the site was embedded across all the major Publishers in Australia, as well as linked to from google.com.au and YouTube. Needless to say, we got a lot of traffic. Peaks of 700 requests per second on App Engine, lots!
The site was built on Google App Engine and Cloud Firestore, using Google Maps and Google Places API. In this post we’ll discuss a few techniques we used to optimize user experience and reduce load on our servers as much as possible.
Quick introductions. Hi.
I’m Drew. I’ve been working at Google for sometime now. My current role is as a Customer Solutions Engineer, helping our Advertisers get the best out of their websites, apps, and data. All from my desk in beautiful Sydney.
I’m Leon. I’ve been working at Google for 3 years less than Drew. My current role is a Software Engineer on Google Maps, primarily on Android and backend development. This was my first venture into App Engine and serious web site development as all my previous was on desktop, Android and backend.
Some of these are common sense if you’ve been playing in this world for awhile, but hopefully one point may stick or help you in your next project. Please let us know in the comments.
App Engine scales instances as the demand comes in, so we already knew it would easily cope with the demand. That said, there were a few things we looked at.
✔ Estimating quota
Yes, those of us working at Google are also subject to quota limits. It was important for us to estimate the amount of quota we would spend on the day, to ensure we don’t hit any limits.
By doing a ‘soft launch’ first we could get some small traffic numbers e.g 10 QPS and see what that did to quota of the various services, and then extrapolate to what quota would look like at higher QPS over a set time. Another useful tool was the Google Cloud Pricing Calculator to get an indication on spend on the various platforms.
✔ Minimum number of idle instances
We knew the load was going to come in quick. Setting the minimum number of idle instances — along with a powerful one — allowed us to have a reasonable number of instances running, reducing latency and coping with initial high demand. We configured our app.yaml with…
We were fortunate to have past results from similar sites to compare with, so had an idea on what levels of QPS we would receive when the traffic hits. If you don’t have past results you could consider CTR of the various media sources driving traffic to your page over time.
✔ Using Memcache (distributed RAM cache)
The polling place data from the AEC was all stored into Memcache. This made lookup nice and quick, and importantly meant we didn’t need to be billed for any data storage read / writes. Backend services were setup to read from the relevant data sources periodically (every minute) and update if anything has changed.
It ensured identical queries — such as retrieving electorate boundaries at various zoom levels, or retrieving polling places for electorates specified via URL parameters — were only computed once. Prior to computing a response to a particular request, Memcache was checked for the presence of a previously computed response.
Memcache was keyed by the raw URL (together with query parameters) and suffixed with a version stamp, YYYYMMDDVV where VV is a monotonically increasing version number. Having a version stamp appended to the key allowed the cache to be invalidated for new requests, after significant changes to the server and/or back end data were made, without impacting the currently serving server and Memcache (ie avoid flushing cache on mass).
Serve as much as possible from static (images, CSS, even some JS…) to reduce the load on App Engine. Just remember the caveat that if you need to do an update (like we needed to styles) you’ll need to cater for users who may have an older version cached.
As an example. We introduced a new element on the website. Any users pulling the older cached CSS from static wouldn’t have any styles for this element. So we hid the element itself with display: none inline, and in the CSS override this display: block !important. Any users pulling the older cached CSS wouldn’t see the — what would be badly styled — element, and then once the CSS is updated on static the element would show.
The App Engine Dashboard has a nice visual for Requests by type, Memcache operations…etc over time to get an insight into how these optimizations are working. For more customization we use Stackdriver, allowing us to view particular detail on specific dates.
✔ Pre-process calculations
Let the server side do all the heavy lifting. Pre-grouping polling places by electorate and lat/long (check out Density-based spatial clustering of applications with noise (DBSCAN) data clustering algorithm).
The endpoint that returns polling places can be passed the Google Map viewport information when called, and use this to only return the ones relevant to what the user needs to see (as we mentioned in the Memcache section). It can be called again when the user zooms / scrolls to a different Google Map viewport. This saves so much time (and potential API quota) trying the calculate this stuff front end!
Unfortunately it’s not a perfect world and sometimes an endpoint can hang i.e perhaps we get a big spike in latency. To prevent the website hanging on a response from a hung endpoint we set timeouts in relevant places. So if the endpoint wasn’t fast enough we would fail it and either 1) do nothing (if the information is critical to the user) or 2) re-try a set number of times with exponential back-off.
✔ Memory checks / optimizations
Check what your site is doing to Memory (RAM), to catch and avoid possible memory leaks. Chrome has a Task Manager view that shows how much memory an individual browser tab (i.e your site) uses, which updates as you use the site.
Google Places API
This service was used as a search and autocomplete, allowing a user to search for a location / postcode and zoom in on that area. We found this was the most costly and quota restrictive service, and to reduce this down considered a few things.
✔ Watch out!
“A Places Details request also generates Data SKUs (Basic, Contact, and/or Atmosphere), depending on the fields that are specified in the request. If you DO NOT specify fields in the Place Details request, ALL Data SKUs are triggered (Basic, Contact, and Atmosphere), and you are charged for the Places Details request plus the cost of all the data.”
✔ Set a limit on the letters typed before calling the service
As the user was searching there was little point wasting an API call on 1–2 letters, so we set this at 3, but during heavy traffic were tempted to increase it to 4. In this particular case each result was subject to its own API lookup, so we also limited the results displayed.
Also — to manage billing — make sure you are using Place Autocomplete Session tokens correctly (it caught us out a little). It’ll ensure you are only billed once per session, with a session starting when the user starts typing a query and finishing once a place is selected.
✔ Delay calls
Add a timeout of a few milliseconds after key press before firing any API calls, so we only do the lookup once the user has finished typing. It’ll reduce the autocomplete effect but save a lot on API requests.
As before, cache these requests for similar searches.
Google Maps API
Google Maps API is already heavily optimized out of the box, so not many options to tweak. Fortunately, the cost of serving map loads wasn’t too bad. To save on additional API functions / quota we would link out to the Google Maps app for directions (instead of implementing this functionality).
Remember to load test
To finish, a reminder to always load test when you are expecting large volumes of traffic. The load test can simulate heavy traffic and determine whether or not your application can handle and perform under the expected load.
In addition to load test we used a lovely tool called Lighthouse in Chrome to optimize performance, along with many other useful audit tools.
Thanks for reading, and for making the web faster,