Data Science Project – Patient Pathway Analysis

This is a short piece on a project I’m developing at NHS England, using Python to analyse patients’ inpatient care and determining where a Clinical Commission Group’s (CCG) patients are being treated significantly differently to those patients in its peer group CCGs (similar 10 CCGs as determined by NHS Rightcare Methodologies). While not traditional ‘data science’ (no machine learning or AI here!), the volume of data being looked at and relatively automated nature of the process using a traditionally ‘data science-y’ tool like Python has lead myself and my team to label it as such.

This is going to be a relatively high-level overview of the work I’m doing; the code is still being developed and I can’t share outputs or the data due to them containing patient level records. The tool I’ve produced has a data link to the Secondary Uses Service (SUS) dataset which contains data covering patients’ accident and emergency (A&E) attendances, inpatient admissions and outpatient care. Using SQL to query the database I’m able to pull a large bespoke set of data to analyse for a CCG and its similar 10 CCGs for a specific set of diagnoses, programme budget codes , and age groups. As the data is mostly categorical, in order to assess differences between a CCG and its peer group I’ve decided to use chi-squared tests using contingency tables.

The tool highlights where a CCG’s patients are being treated differently to its peer group using these tests and attempts to show where these differences may be leading to poorer outcomes by comparing each category for a given variable against an outcome measure (currently this is only length of stay, but I’m hoping to expand it further to readmissions, costs and others).

“The tool highlights where a CCG’s patients are being treated differently … and attempts to show where these differences may be leading to poorer outcomes”

This is done at scale, and the tool tries to look at most variables available within the inpatient episodes (a care spell is made up of one or more finished consultant episodes) data set, such as: the treatment function code (what speciality of consultant the patient is being seen by); the admission method (elective or non-elective: via A&E, ambulance or GP admission for example); the primary procedure done on the patient; and the intended management of the patient (is the patient due to stay over-night or not) amongst others.

I’m also attempting to analyse patients’ flow through procedures – visualising this by using a Sankey diagram (or more precisely an alluvial diagram). As procedures are coded as sets of procedure codes this helps highlight the major set of procedures being done as a whole and where patients may have multiple procedures in an episode of care or where certain procedures are leading to several other procedures before a patient would normally finish their episode of care.

There’s still a lot of work to do – I’ve just shared the first set of drafts with my major internal ‘customers’ and they’re very keen on it and I’ve now been inundated with a deluge of requests to use the tool with specific CCGs and diagnoses and see what it churns out! There’ll most likely be more to update on this analysis down the line so potentially stuff to look forward to in future.