How I would have designed the JKJAVMY AstraZeneca appointment database

Designing for traffic spikes is a challenge

The litmus test for application design is dealing with traffic spikes— especially for transactional use-cases. A transactional use-case (like the AZ appointments) is one where writing and reading data is dependent on other criteria (slot availability), and writes need to fail if the slots are not available.

A litany of failures

There were (many) other failings in the design of the https://www.vaksincovid.gov.my/ appointment systems, among them:

  • No visual feedback to the user when appointment slots fail to load — the user has no idea what to do next.
  • Loading appointment slots and submitting the form caused several errors, all with no visual feedback and no indicator on what to do next. Some of the errors returned were “500 Service Unavailable/Timeout” (seen above). Other times, the error was “429 Too Many Requests”, which means that the user should NOT retry what they are doing, or risk being blocked for a longer time — this ommission is especially fatal because of the lack of visual feedback.

It’s all about the database

However, experience and intuition tells me the overarching failure was in the design of the database.

  • Most monolithic databases cannot scale automatically — my intuition is this is why the system needed to be down at 10am-12noon.
  • Once you have allocated the largest database size available (about 100 CPU units on Amazon Web Services), you are out of options. You can only watch helplessly as all your tables and connections lock, and your users undergo a roulette of whether both their request to get the appointment slots AND submitting their form both go through.

Partitioning is the answer

The good (or bad, depending on how accountable you hold JKJAVMY) news is that architecting internet-scale databases has been a solved problem for a while now. The answer is simply data partitioning.

A monolithic start
The visual design gives us a hint of how to prepare the data for high traffic.
A data model representing partitioning by PPV
A highly scalable example where data is partitioned by PPV and day.

Please do better

There are definitely other things that could have been done— a queue system, a waiting area, etc — or not to have a booking system at all. I hope JKJAVMY improves the process as lives are at stake.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Timothy Teoh

Timothy Teoh

197 Followers

Full-stack software architect and technology leader from Kuala Lumpur, Malaysia