Flu   Last updated:

Number of Trusts

Current bed occupancy

Based on number of laboratory-confirmed cases of Flu

Change since this time last year

Change since one week ago

About this app

The bed usage dashboard provides key insights into current hospital bed usage by various causes. Currently only Flu cases are displayed, but the modular design of the data-preprocessing workflow and the Shiny application mean this app can be extended to cover other forms of illness reported through the Daily SitRep data.

Why does this exist?

We made this as a technical exercise in creating an easy-to-maintain Shiny application following best practices, while utilising the existing available deployment infrastructure to avoid any additional hosting costs.

You can see other examples of our work in our dashboard gallery.

Where did the idea come from?

This is not the first app to track bed usage from seasonal illnesses. The original inspiration came from a ’flu-tracker Shiny application (source code) created by the DS Team within NUCT at NHS ML. Since the development of this application, the author of the original application has announced his retirement, meaning that application is looking for a new maintainer and host. We wish Andy McCann all the best in his retirement and thank him for the inspiration to this version of the application.

So what was the aim of this application?

We wanted to take the original ’flu-tracker application as inspiration and make a new version from scratch, along the way demonstrating best practices for making an efficient, low-maintenance Shiny application. Our main aims were to:

  • Use modern {bslib} layouts that enable easy branding through _brand.yml files.
  • Avoid inputs that can cause keyboard navigation issues to meet more WCAG 2.1 AA standards.
  • Create a maintainable modular codebase, that can easily extend to more illnesses than just the ’flu dataset.
  • Implement pipelines to automatically prepare and quality-check data from original sources to:
    • Reduce application loading times and computational load,
    • Provide immediate alerts when incoming data does not meet the expected standards,
    • Provide resiliance to the original dataset becoming unavailable in the future.

We also self-imposed a requirement to only use our existing infrastructure, without the need to setup any extra resources or incur any additional costs. This is to recreate the scenario many Data Science teams face on most days—aiming to produce and deliver their work using only what’s already available to them, while trying to reduce the burden of requesting more help from their IT departments.

  • All development was performed exclusively on our Posit Workbench server.
  • Code storage was handled by our existing Git provider.
  • The hosting of the application and scheduled pre-processing pipeline are performed on our Posit Connect server.

If you’re interested in providing your Data Science team with the tools they need to work freely, we can help setup Posit infrastructure for your enterprise.

Why bother with any data preparation?

This Shiny application is designed to do as little as it needs to:

  • Load in some prepared data
  • Filter according to the user’s wishes
  • Display the results
  • Produce data downloads

This allows for a fast, lightweight, and easy-to-maintain Shiny application without any loss in functionality.

What you see in this Shiny app are cleaned and optimised datasets that ~we~ an automated pipeline prepared earlier. What you don’t see is that preparation, and it goes a little like this.

How we prepare the data

We start with our “single source of truth”: A UEC Daily SitRep file hosted on the NHS England page of Urgent and Emergency Care Daily Situation Reports. This comes as an Excel (.xlsx) format document where each dataset is on a separate sheet.

We use the {rvest} package to access each UEC Daily Situation Reports page, then search for the correct hyperlink on the page. The URL to the data file changes on a monthly basis, and the filename changes depending on the year, but there is one consistent element we can look for in the target URL that points to the correct file: The link to the .xlsx file always contains the string UEC-Daily-SitRep.

We expect the historical data to not change, and it’s possible that the current year’s data has not been updated since we last performed the pipeline. So once the file is downloaded, we compare the MD5 sum of the new data against the MD5 sum of the data we last used in a computation. If they match, we know there was no change to the data, and we skip any further steps to reduce unnecessary workload on our server.

The data appears as a wide-format table in the spreadsheet, listed below several lines of metadata. It’s important to check the pre-processing script imported the correct rows and columns out of the sheet to create a valid table with the correct heading row.

We use {pointblank} to check the format of the imported table. If the table format changes unexpectedly, or data is not appearing in the formats we expect, {pointblank}’s agent will flag issues in the table and have the ability to terminate our data-preprocessing pipeline early, while still generating the report so we can see what issues it found in the data.

We can then perform all our data pre-processing—showing previews of how the data looks at various stages along the way—to get from the wide-format data from the spreadsheet, into the long-format data that is optimal for our application.

After the reformatting is complete, we use {pointblank} again to check the final results are as expected. For example, we can check that every location reports one, and only one, value for each day. We could also ask {pointblank} to warn (not error) if it encounters an unusually large jump in values from one day to the next at a location. By warning, we allow the pipeline to complete but with a flag for someone to manually review if this is a valid data point.

This step is fairly straightforward. With the routine working for one year, it becomes simple to map this routine to happen across multiple years using the {purrr} package. What’s especially nice about using {purrr} is that we can take an embarrassingly parallel job—downloading and processing each year independently—and run it in parallel using {furrr}.

The final stage of our Quarto document is to store the cleaned and verified results to our Posit Connect server using the {pins} package. The files are stored in parquet format, since these are compact and high-performance for reading into R and Python.

By containing all these steps in a Quarto project and uploading the source .qmd file to Posit Connect, we can instruct Posit Connect to build the project on a regular schedule. Since the spreadsheet containing raw data is only updated once a week and we don’t require the update to happen immediately, we’ve set this processing pipeline to happen once a week, on a Thursday at 03:42 am GMT. Why Thursday at stupid-o’clock? Well that’s when our server is likely to be quietest, with very few people browsing Shiny applications that the server also hosts. By spreading out these non-critical jobs to where the server is quiet, we reduce the peak-capacity requirements on our server, reducing overall costs and making the most of existing infrastructure.

But if you need more frequent updates, you can set it to happen as often as you need. Even repeating every minute of every day is an option!

Once Posit Connect completes the pipeline, it automatically sends an email to the maintainers containing a link to the latest processing and quality report. And if something starts to go wrong, Posit Connect provides a history of previous versions of the report, allowing you to go back in time to each publication date to see how and when things happened in the past.