This post is a discussion of the ongoing development of the Patient Pathway Analysis tool I was working on over the winter of 2018. I’ll be covering the techniques I’ve used, some of the challenges I’ve had while producing the tool, and techniques I’d like to use in future.
As mentioned in my previous blog post regarding the Patient Pathway Tool, it utilises Chi Squared tests (on contingency tables) to determine whether a CCG’s patients are being treated differently to their similar 10 peers. This is done across a selection of indicators from the SUS inpatient spells and episodes tables. Once an indicator has been determined to have been carried out significantly differently compared with the CCG’s peers, I’ve worked on providing the tools to investigate what happened to the patients for each category of the indicator across their inpatient stay.
This was achieved by using the interactive elements of Jupyter Notebooks and the visualisation package Altair. The user can select a category from a given indicator and see what occurred to that group of patients across all other indicators from the data set. This allows an analyst to determine the possible reasons for any given difference – for example some differences may be explained by the specific diagnosis or group of procedures that are being carried out on a set of patients, showing that the difference in patient cohort might have led to the differences determined by the tool. Or that one of the CCG’s peers are admitting patients for a set of procedures that might normally be done as outpatient procedures, causing differences throughout each indicator.
This allows the analyst to then remove any group of patients that might be considered outliers and examine whether any differences have occurred in the new, more homogenous subset of patients.
As an example, looking at Head and Neck Cancer patients for a CCG – the tool identified a difference in the proportion of patients having an operation between the CCG and its peer group (the best 5 CCGs for length of stay (LoS) for the Head and Neck Cancer patients).
The tool then allows us to drill down on the group of patients coded as having an operation status of 8 (Not Applicable), to see what occurred with this group of patients across the rest of their inpatient care. We identified that the patients were most commonly having the ‘X654’ procedure, the code for a type of Radiotherapy. Patients having this procedure from the best 5 CCGs were almost entirely from one large CCG – skewing the proportions a large amount. This cohort of patients could then be removed to explore whether any differences occurred across a more similar set of patients.

Procedures with small numbers have been removed
A large part of this project that I generally didn’t account for has been managing the queries to the SUS database. Trimming the query down to fewer fields (for example dropping procedure codes past the 10th due to the small number of patients this affects), creating more efficient joins and creating stored procedures the tool calls were all important parts of reducing the load times from 5 minutes (!) to around 30 seconds. This has allowed small changes to be made to the query (adding/removing diagnosis codes etc.) without having to worry about the massive amount of time that would be wasted waiting for the data to be passed to the tool.
As the tool increased in complexity it was also important to create modules of functions to be called upon within the notebook environment – to abstract the complexity and reduce the cognitive load on the user when making small changes to the tool. It required breaking down the actions carried out in each cell into functions; working out where similar functions existed and then slightly adjusting those similar functions or their outputs to work for all the required cells. I used Visual Studio Code as my editor of choice for writing the .py files that housed the functions, but any text editor or IDE would suffice. Here are some examples of the functions that were created; functions for producing each of the types of visualisation used in the tool; functions for calling SQL queries; functions for transforming the data into contingency tables and functions for enabling suppression of small numbers from outputs.
def ClusteredBarChart(data: pd.DataFrame, indicatorCol: str, descCol: str, x_Sort: list, indicatorTitle=''): ''' The standard clustered bar chart showing the chosen CCG against the best 5 or similar 10.''' chart = alt.Chart(data[data['Include'] == 1]).mark_bar(stroke='transparent').encode( x = alt.X('variable:N', axis = alt.Axis(title=''), sort=x_Sort), y = alt.Y('value:Q', axis = alt.Axis(title='% of Patients', format='.0%')), color= alt.Color('variable:N', legend = alt.Legend(title = ''), sort=x_Sort), column=alt.Column(indicatorCol, type='nominal', title=indicatorTitle), tooltip=[alt.Tooltip(field='value', type='quantitative', title='% of Patients', format='.2%'), 'Counts', descCol] ).configure_view( stroke ='transparent' ) return chart
An example of a function used to produce visualisations in the tool, making use of the Altair package.
I’ve looked into other techniques that we may be able to use in the future; my favourite of the techniques so far has been clustering using K-prototypes, a technique combining the K-means and K-modes methods for clustering data. K-prototypes would be needed due to the mix of categorical (e.g. operation status and procedures carried out) and continuous variables (e.g. the patient’s distance from provider and their length of stay). The package needed to do this type of analysis in python is kmodes. Unfortunately, I’ve moved roles recently so won’t be able to work on this myself – but hopefully this will be able to be implemented by my colleagues when they get the time to work out the code base!