Do you get regular data files that you need to consolidate/compile together for a report or dashboard? This method will help you keep your reporting files lightweight and portable, while making updating their data as easy as pressing Refresh. The basics can be done in five easy steps. For our example, we’ll be using the quarterly NHS CYPED (Children and Young People Eating Disorders) data, specifically the ‘Provider and CCG’ files for 2021/22, if you’d like to follow along at home.
Collect all your raw data files, separated by month/quarter/day, into a single folder.
In Excel, go to the ribbon (the taskbar at the top of the application), and enter the ‘Data’ tab. From there, select Get Data -> From File -> From Folder, and then navigate to the folder you’ve stored your data in from step 1.
In the dialogue boxes that open, select ‘Combine & Transform Data’.
Select a sample file to transform (the default option is the first file, which should be fine in most cases). Then, select the sheet that contains the raw data you need. Hit OK.
In the now open Power Query editor, select ‘Transform Sample File’ from the left pane. You can then transform the data as required (if no transformation is required, just hit ‘Close & Load’).
Continuing the Example
We will now show how to transform data step-by step using NHS CYPED (Children and Young People Eating Disorder) data.
In this example, we will need to transform the data in two steps. Once for the Urgent case data, and another for the Routine case data.
First, we will work on extracting the Urgent Case Data.
Step 1. Removing Rows
In the Transformation Sample File, remove the top 14 rows of data, as this contains information regarding the data set, not the data itself. To do this, use the button in the ribbon called ‘Remove Rows’ -> ‘Remove Top Rows’.
Step 2. Removing Unneeded Columns
Now select columns 2-7. These columns contain the required raw data for Urgent CYPED cases. Right click and select ‘Remove Other Columns’.
NB: We are not interested in column 8 or 9 of this data as they are summary statistics that are derived from columns 4-7. They can easily be calculated after we are done.
Step 3. Making Our Headers
Select ‘Use First Row as Headers’ from the ribbon to make the first row of data into our new table headings.
Step 4. Unpivoting Columns
Select the ‘>0-1 Week’ to ’12 plus’ columns. Right click and select ‘Unpivot Columns’. This will transform the data into an easily manipulated and pivotable data format known as ‘Tidy Data’.
NB: We do this as these columns describe different categories of the same independent variable, ‘Time to Start Treatment’. Ideally, all columns will either be independent variables (what you are changing; e.g. categories, dates, names), or dependent variables (what you are measuring; e.g. counts, temperature, and other data).
Rename the new columns to better describe the variables. Do this by double clicking in the heading for each column.
‘Attribute’ = ‘Time to Start Treatment’
‘Value’ = ‘Number of Patients’
Step 5. Remove Redundant and Erroneous steps.
Now, navigate to the ‘CYPED Data Files’ query in the left pane.
Remove the ‘Changed Type’ step by clicking on the ‘X’ alongside it within the ‘Applied Steps’ pane on the right of the Power Query editor.
This will have been causing problems as it was trying to change types of columns that we have since removed.
Step 6. Adding Date Column
Select the ‘Removed Other Columns1’ step.
Here, we can add another column to describe the quarter of data each file refers to.
Select the ‘Source.Name’ column, then in the ribbon, go to the ‘Add Column’ tab and select ‘Column from Examples’ -> ‘From Selection’.
You will then be met with an ‘Insert Step’ dialogue box. Insert the step. This shouldn’t break anything as this new column will not be interacted with by any future step).
Fill in the ‘Custom’ column on the right of the main pane, denoting the date of the data for each file.
You can then rename the ‘Custom’ column to ‘Quarter’ by double clicking its header.
Reorder the columns so that the ‘Quarter’ column is on the left by dragging and dropping its header.
You will be asked to insert a step again, click insert.
Remove the ‘Source.Name’ column as it’s no longer needed. Right click its heading and select ‘Remove’. Its descriptive property has been inherited by the new ‘Quarter’ column. Insert the step when prompted.
Step 7. Viewing the Finished Query, and Renaming the Query
Clicking on the final step of the ‘Applied Steps’ pane will show us the completed, transformed data set.
In the ‘Queries’ pane on the left, rename the query to better reflect the data it contains, e.g. ‘Urgent CYPED Data’.
Step 8. Loading the Data into Excel
Go to ‘Close & Load’ in the ribbon’s home tab. This will load the data into Excel.
Step 9. Extracting the Routine CYPED data.
Repeat the above steps, but during step 2 (removing unneeded columns), select columns 2-3 and columns 11-14 (containing our routine data).
Follow the remaining steps, but on step 7 (viewing the final query and renaming it), rename the query to ‘Routine CYPED Data’ instead of ‘Urgent CYPED Data’.
Step 10. Merging both queries
10a. Add a custom column to both queries at the ‘Removed Columns’ step.
Name the column ‘Patient Type’ and assign it a string by entering:
For the routine data:
For the urgent data:
10b. Reorder the column by dragging and dropping it to the left.
10c. Once 10a and 10b completed for both queries, go to ‘Append Queries’ -> ‘Append Queries as New’ in the home tab of the ribbon. Select the two tables and hit OK.
Rename the new query as ‘All CYPED Data’.
Step 11. Load data into Excel
If any new data is made available, you will need to add it to your data folder and hit ‘Refresh All’ to load the data into the consolidated data file.