How to… Consolidate Quarterly or Monthly Data Using Power Query

Do you get regular data files that you need to consolidate/compile together for a report or dashboard? This method will help you keep your reporting files lightweight and portable, while making updating their data as easy as pressing Refresh. The basics can be done in five easy steps. For our example, we’ll be using the quarterly NHS CYPED (Children and Young People Eating Disorders) data, specifically the ‘Provider and CCG’ files for 2021/22, if you’d like to follow along at home.

https://www.england.nhs.uk/statistics/statistical-work-areas/cyped-waiting-times/

Step 1.

Collect all your raw data files, separated by month/quarter/day, into a single folder.

Example: Three Quarters of Raw Data Files Stored in the Same Folder

Step 2.

In Excel, go to the ribbon (the taskbar at the top of the application), and enter the ‘Data’ tab. From there, select Get Data -> From File -> From Folder, and then navigate to the folder you’ve stored your data in from step 1.

Example: Location of the Get Data From Folder Process.

Step 3.

In the dialogue boxes that open, select ‘Combine & Transform Data’.

Example: The Power Query Wizard for Extracting Data From a Folder

Step 4.

Select a sample file to transform (the default option is the first file, which should be fine in most cases). Then, select the sheet that contains the raw data you need. Hit OK.

Example: Selecting the file and sheet we want to extract using Power Query.

Step 5.

In the now open Power Query editor, select ‘Transform Sample File’ from the left pane. You can then transform the data as required (if no transformation is required, just hit ‘Close & Load’).

Example: The Power Query Editor
Example: The Transform Sample File Query

Continuing the Example

We will now show how to transform data step-by step using NHS CYPED (Children and Young People Eating Disorder) data.

https://www.england.nhs.uk/statistics/statistical-work-areas/cyped-waiting-times/

In this example, we will need to transform the data in two steps. Once for the Urgent case data, and another for the Routine case data.

First, we will work on extracting the Urgent Case Data.

Step 1. Removing Rows

In the Transformation Sample File, remove the top 14 rows of data, as this contains information regarding the data set, not the data itself. To do this, use the button in the ribbon called ‘Remove Rows’ -> ‘Remove Top Rows’.

Example: Removing the Top Rows from the Transformation Sample File.

Step 2. Removing Unneeded Columns

Now select columns 2-7. These columns contain the required raw data for Urgent CYPED cases. Right click and select ‘Remove Other Columns’.

NB: We are not interested in column 8 or 9 of this data as they are summary statistics that are derived from columns 4-7. They can easily be calculated after we are done.

Example: Removing Unneeded Columns.

Step 3. Making Our Headers

Select ‘Use First Row as Headers’ from the ribbon to make the first row of data into our new table headings.

Example: Selecting the ‘Use First Row as Headers’ button from the ribbon.

Step 4. Unpivoting Columns

Select the ‘>0-1 Week’ to ’12 plus’ columns. Right click and select ‘Unpivot Columns’. This will transform the data into an easily manipulated and pivotable data format known as ‘Tidy Data’.

NB: We do this as these columns describe different categories of the same independent variable, ‘Time to Start Treatment’. Ideally, all columns will either be independent variables (what you are changing; e.g. categories, dates, names), or dependent variables (what you are measuring; e.g. counts, temperature, and other data).

Example: Unpivoting Columns

Rename the new columns to better describe the variables. Do this by double clicking in the heading for each column.

‘Attribute’ = ‘Time to Start Treatment’

‘Value’ = ‘Number of Patients’

Example: The result of unpivoting columns
Example: New columns given descriptive names.

Step 5. Remove Redundant and Erroneous steps.

Now, navigate to the ‘CYPED Data Files’ query in the left pane.

Example: Selecting the CYPED Data Files Query from the ‘Queries’ pane.

Remove the ‘Changed Type’ step by clicking on the ‘X’ alongside it within the ‘Applied Steps’ pane on the right of the Power Query editor.

This will have been causing problems as it was trying to change types of columns that we have since removed.

Example: Finding the ‘X’ to remove the ‘Changed Type’ step

Step 6. Adding Date Column

Select the ‘Removed Other Columns1’ step.

Example: The data held in the ‘Removed Other Columns1’ step.

Here, we can add another column to describe the quarter of data each file refers to.

Select the ‘Source.Name’ column, then in the ribbon, go to the ‘Add Column’ tab and select ‘Column from Examples’ -> ‘From Selection’.

Example: Where to find the From Selection add column button.

You will then be met with an ‘Insert Step’ dialogue box. Insert the step. This shouldn’t break anything as this new column will not be interacted with by any future step).

Example: Insert Step Dialogue Box

Fill in the ‘Custom’ column on the right of the main pane, denoting the date of the data for each file.

Example: Entering the date for each data file in the ‘Custom’ column.

You can then rename the ‘Custom’ column to ‘Quarter’ by double clicking its header.

Example: The newly renamed ‘Quarter’ column.

Reorder the columns so that the ‘Quarter’ column is on the left by dragging and dropping its header.

You will be asked to insert a step again, click insert.

Remove the ‘Source.Name’ column as it’s no longer needed. Right click its heading and select ‘Remove’. Its descriptive property has been inherited by the new ‘Quarter’ column. Insert the step when prompted.

Example: Removing the ‘Source.Name’ column after moving the ‘Quarter’ column left.

Step 7. Viewing the Finished Query, and Renaming the Query

Example: The newly transformed, collated, Urgent case data.

Clicking on the final step of the ‘Applied Steps’ pane will show us the completed, transformed data set.

In the ‘Queries’ pane on the left, rename the query to better reflect the data it contains, e.g. ‘Urgent CYPED Data’.

Example: Renaming the Query.

Step 8. Loading the Data into Excel

Go to ‘Close & Load’ in the ribbon’s home tab. This will load the data into Excel.

Example: The ‘Close & Load’ option in the ribbon.
Example: The loaded data within the Excel spreadsheet.

Step 9. Extracting the Routine CYPED data.

Repeat the above steps, but during step 2 (removing unneeded columns), select columns 2-3 and columns 11-14 (containing our routine data).

Follow the remaining steps, but on step 7 (viewing the final query and renaming it), rename the query to ‘Routine CYPED Data’ instead of ‘Urgent CYPED Data’.

Example: Completed Routine CYPED Data query

Step 10. Merging both queries

10a. Add a custom column to both queries at the ‘Removed Columns’ step.

Example: Where to Find the ‘Custom Column’ button.

Name the column ‘Patient Type’ and assign it a string by entering:

For the routine data: = "Routine"

For the urgent data: = "Urgent"

Example: New Column ‘Patient Type’ with string ‘Routine’ for the routine data query.

10b. Reorder the column by dragging and dropping it to the left.

Example: Reordered columns

10c. Once 10a and 10b completed for both queries, go to ‘Append Queries’ -> ‘Append Queries as New’ in the home tab of the ribbon. Select the two tables and hit OK.

Rename the new query as ‘All CYPED Data’.

Example: Renaming the new ‘Append1’ query.

Step 11. Load data into Excel

Example: All the data is available in Excel, each table on a different sheet.

If any new data is made available, you will need to add it to your data folder and hit ‘Refresh All’ to load the data into the consolidated data file.

Example: Where to locate the ‘Refresh All’ button.

Lightweight DND 5th Edition Item Tracker (Excel)

Taking inspiration from this old blog post from Rotten Pulp, here’s an (optional) item tracker that you can use to keep a more immersive record of what your character has equipped and where.

5th Edition Item Tracker

The tool itself covers most of the rules you’ll need to be aware of for this new ruleset. I took some guidance from the blog post mentioned above and tweaked some of the rules to better suit 5th edition.

You have up to six containers, which you name yourself, like your rucksack, your quiver, belt etc. These containers each have three slots, which can carry up to three items, based on their size. Heavy weapons like great swords take up three slots, light weapons like daggers and rapiers take up one slot, and all other weapons like longswords take up two slots. This method allows you and your DM to keep track of what and where your characters are carrying their equipment. Some interesting things can happen as a result of this.

Imagine your character is trying to cross a ravine, and your DM asks for an acrobatics check to see if you cross it, and they set the DC at 15. You roll an 8, and with your acrobatics bonus… you get 14. You fail the roll. Now the DM could just kill you, throwing you down the ravine. However, with this inventory system, the DM could tell you that you successfully make the jump, but you slip. Before falling down the ravine, you grab onto a loose vine that barely holds your weight. In order to pull yourself up, you need to shed one of your containers. Your character then chooses which to detach from his body, and the items (instead of your dwarf, Brungar, dying) fall to the dark depths, and you lose your arrows, 1000 copper pieces, and rations for the weeks ahead.

The DM can also award special items, like a smaller magical shield that only takes up one slot, or a shrinking longsword for storage, that make better use of the container system.

The item tracker for a character wearing heavy armour

Feel free to modify the tracker as you see fit to best fit your campaign and DM’ing style.

You can download the tool here:

Lucy in the sky with asteroids — Miscellaneous Details

Well done to the data team at The Economist, having to visualise a 12-year journey of a space probe as it makes a flying visit to eight different asteroids. They’ve made a very complicated journey look quite straightforward and elegant. A probe intended to study the Trojan asteroids takes off – The EconomistLucy, as this […]

Lucy in the sky with asteroids — Miscellaneous Details

Lightweight Excel DND Dice Roller

To compliment my lightweight offline Excel DND character sheet, I’ve created a dice roller to be used when playing the game. It can be used completely offline, and should run on basically all versions of Excel (provided macros can be enabled). Please feel free to use this and share as required:

DND Dice Roller

For more Excel savvy users, if you’d like to modify the sheet, the password is 1234. The lock is just in place to make it difficult to break the formula in the Result column. There are two active macros; one is the Roll function, and the other a manual calculator attached to the ‘Roll’ button.

Micro How to… Format Data as a Table

How

Example Data, Unformatted

Step 1. Highlight the data table

Data Table Highlighted

Step 2. On the Ribbon, open the Insert tab, and select ‘Table’ (Shortcut CTRL + T).

Insert Table Button

Step 3. Hit Ok on the Create Table dialogue box.

Create Table Dialogue Box

You should now have a table that looks like the following:

Data Formatted as a Table

Why should you format data as a table?

Data formatted as a table allows you to easily reference required data using simple named ranges for the table. It also lets you easily create pivot tables, graphs and access the data from advanced features like Power Query.

Easier Referencing

In the above example, the Revenue column is calculated using the price reference table on the right (in cells I1:J5). The formula in cell F2 is as follows:

=VLOOKUP($D2,$I$2:$J$5,2,FALSE)*E2

If we change the price reference to a table as well, as in the below example using the steps from above:

Price reference data formatted as a table

We can instead reference the data table itself, using named ranges automatically created by Excel. This makes the old VLOOKUP formula look like this instead:

=VLOOKUP([@ItemID],Table2,2,FALSE)*[@Quantity]

The @ sign tells Excel to look at the current row of the data table, so in plain English we are asking Excel to look for the Item ID from the current row in Table 2 (the price reference table), return the data from the 2nd column, and then multiply it by the quantity in the current row.

This makes it far easier to understand your formula, for yourself and others using your workbooks, and also means that if the data in the price reference table ever grows (e.g. you start selling more, different things that require new Item IDs to be added), the VLOOKUP will not need to be updated as it will always look at the entirety of the data, instead of the fixed reference of $I$2:$J$5 in the original formula.

Pivot Tables

Select any cell within the data table, go to the Insert tab in the Ribbon, and press the PivotTable button. Excel will automatically try to make a PivotTable for the entirety of the data table.

Create PivotTables easily

Lightweight Excel DND Character Sheet

Last year, after starting an online Dungeons and Dragons 5th Edition campaign with friends to get through the lockdown, I made an Excel Character Sheet so that my players could keep track of their in-game character stats and items. Please feel free to use this and share as required:

Need a dice roller to go along with your Excel character sheet? Take a look at my offline Excel dice roller, free to download and share: http://staticinterest.com/2021/10/26/lightweight-excel-dnd-dice-roller/

Want to try out a new way of tracking your in-game items? Try out my lightweight item tracker tool, free to download and share: https://staticinterest.com/2022/01/25/lightweight-dnd-5th-edition-item-tracker-excel/

Patient Pathway Analysis Tool – Update

This post is a discussion of the ongoing development of the Patient Pathway Analysis tool I was working on over the winter of 2018. I’ll be covering the techniques I’ve used, some of the challenges I’ve had while producing the tool, and techniques I’d like to use in future.

As mentioned in my previous blog post regarding the Patient Pathway Tool, it utilises Chi Squared tests (on contingency tables) to determine whether a CCG’s patients are being treated differently to their similar 10 peers. This is done across a selection of indicators from the SUS inpatient spells and episodes tables. Once an indicator has been determined to have been carried out significantly differently compared with the CCG’s peers, I’ve worked on providing the tools to investigate what happened to the patients for each category of the indicator across their inpatient stay.

This was achieved by using the interactive elements of Jupyter Notebooks and the visualisation package Altair. The user can select a category from a given indicator and see what occurred to that group of patients across all other indicators from the data set. This allows an analyst to determine the possible reasons for any given difference – for example some differences may be explained by the specific diagnosis or group of procedures that are being carried out on a set of patients, showing that the difference in patient cohort might have led to the differences determined by the tool. Or that one of the CCG’s peers are admitting patients for a set of procedures that might normally be done as outpatient procedures, causing differences throughout each indicator.

This allows the analyst to then remove any group of patients that might be considered outliers and examine whether any differences have occurred in the new, more homogenous subset of patients.
As an example, looking at Head and Neck Cancer patients for a CCG – the tool identified a difference in the proportion of patients having an operation between the CCG and its peer group (the best 5 CCGs for length of stay (LoS) for the Head and Neck Cancer patients).

Operation Status Patient Pathway Example

The tool then allows us to drill down on the group of patients coded as having an operation status of 8 (Not Applicable), to see what occurred with this group of patients across the rest of their inpatient care. We identified that the patients were most commonly having the ‘X654’ procedure, the code for a type of Radiotherapy. Patients having this procedure from the best 5 CCGs were almost entirely from one large CCG – skewing the proportions a large amount. This cohort of patients could then be removed to explore whether any differences occurred across a more similar set of patients.

Drilldown Patient Pathway Example
Procedures with small numbers have been removed

A large part of this project that I generally didn’t account for has been managing the queries to the SUS database. Trimming the query down to fewer fields (for example dropping procedure codes past the 10th due to the small number of patients this affects), creating more efficient joins and creating stored procedures the tool calls were all important parts of reducing the load times from 5 minutes (!) to around 30 seconds. This has allowed small changes to be made to the query (adding/removing diagnosis codes etc.) without having to worry about the massive amount of time that would be wasted waiting for the data to be passed to the tool.

As the tool increased in complexity it was also important to create modules of functions to be called upon within the notebook environment – to abstract the complexity and reduce the cognitive load on the user when making small changes to the tool. It required breaking down the actions carried out in each cell into functions; working out where similar functions existed and then slightly adjusting those similar functions or their outputs to work for all the required cells. I used Visual Studio Code as my editor of choice for writing the .py files that housed the functions, but any text editor or IDE would suffice. Here are some examples of the functions that were created; functions for producing each of the types of visualisation used in the tool; functions for calling SQL queries; functions for transforming the data into contingency tables and functions for enabling suppression of small numbers from outputs.


def ClusteredBarChart(data: pd.DataFrame, indicatorCol: str, descCol: str, x_Sort: list, indicatorTitle=''):

''' The standard clustered bar chart showing the chosen CCG against the best 5 or similar 10.'''

chart = alt.Chart(data[data['Include'] == 1]).mark_bar(stroke='transparent').encode(

x = alt.X('variable:N', axis = alt.Axis(title=''), sort=x_Sort),

y = alt.Y('value:Q', axis = alt.Axis(title='% of Patients', format='.0%')),

color= alt.Color('variable:N', legend = alt.Legend(title = ''), sort=x_Sort),

column=alt.Column(indicatorCol, type='nominal', title=indicatorTitle),

tooltip=[alt.Tooltip(field='value',

type='quantitative',

title='% of Patients',

format='.2%'), 'Counts', descCol]

).configure_view(

stroke ='transparent'

)

return chart

An example of a function used to produce visualisations in the tool, making use of the Altair package.

I’ve looked into other techniques that we may be able to use in the future; my favourite of the techniques so far has been clustering using K-prototypes, a technique combining the K-means and K-modes methods for clustering data. K-prototypes would be needed due to the mix of categorical (e.g. operation status and procedures carried out) and continuous variables (e.g. the patient’s distance from provider and their length of stay). The package needed to do this type of analysis in python is kmodes. Unfortunately, I’ve moved roles recently so won’t be able to work on this myself – but hopefully this will be able to be implemented by my colleagues when they get the time to work out the code base!

Data Science Project – Patient Pathway Analysis

This is a short piece on a project I’m developing at NHS England, using Python to analyse patients’ inpatient care and determining where a Clinical Commission Group’s (CCG) patients are being treated significantly differently to those patients in its peer group CCGs (similar 10 CCGs as determined by NHS Rightcare Methodologies). Continue reading “Data Science Project – Patient Pathway Analysis”

HOW TO… Excel: Create a hierarchical series of drop down boxes

As a result of prodding around the excel subreddit I found a query from user innocuous_gorilla about creating drop downs that are dependent upon each other. Due to working at NHS England where we have a massive array of organisation types and hierarchies, I quickly learnt how to do this! For example, in NHS England we have different regions, sub-regions and commissioning groups which tend to be massive long lists if un-filtered, so it makes it hard to select one in a drop down where all options available. Using the following method, you can make a set of drop downs that all depend on a master drop down(s), so choosing a region or sub-region filters the commissioning group drop down box to only those groups that belong to the chosen parent region.

Continue reading “HOW TO… Excel: Create a hierarchical series of drop down boxes”