Improvement implies change, but change does not imply improvement.

We have all experienced the pain of disappointment when a change that promised much delivered no improvement, or even worse, a negative impact.

We have learned to become wary and skeptical about change.

We have learned a whole raft of tactics for deflection and diffusion of the enthusiasm of others.  And by doing so we don the black hat of the healthy skeptic and the tell tale mantra of “Yes, but …”.

So here is an onion diagram to use as a reference.  It comes from a recently published essay that compares and contrasts two schools of flow improvement.  Eli Goldratt’s “Theory of Constraints” and a translation of Systems Engineering called 6M Design®.

The first five layers can be described as “denial”, the second four as “grudging acceptance” … and the last one is the sound of the final barrier coming down and revealing the raw emotion underpinning our reluctance to change. Fear.

The good news is that this diagram helps us to shape and steer change in a way that improves its chances of success, because if we can learn to peel back these layers by sharing information that soothes the fear of the unknown, then we can align and engage.  And that is essential for emotional momentum to build.

So when we meet resistance do we push or not?

Ask yourself. How would prefer to be engaged? Pushed or not?

I am a big fan of pictures that tell a story … and this week I discovered someone who is creating great pictures … Hayley Lewis.

This is one of Hayley’s excellent sketch notes … the one that captures the essence of the Bruce Tuckman model of team development.

The reason that I share this particular sketch-note is because my experience of developing improvement-by-design teams is that it works just like this!

The tricky phase is the STORMING one because not all teams survive it!

About half sink in the storm – and that seems like an awful waste – and I believe it is avoidable.

This means that before starting the team development cycle, the leader needs to be aware of how to navigate themselves and the team through the storm phase … and that requires training, support and practice.

Which is the reason why coaching from a independent, experienced, capable practitioner is a critical element of the improvement process.

Phil and Pete are having a coffee and a chat.  They both work in the NHS and have been friends for years.

They have different jobs. Phil is a commissioner and an accountant by training, Pete is a consultant and a doctor by training.

They are discussing a challenge that affects them both on a daily basis: unscheduled care.

Both Phil and Pete want to see significant and sustained improvements and how to achieve them is often the focus of their coffee chats.

<Phil> We are agreed that we both want improvement, both from my perspective as a commissioner and from your perspective as a clinician. And we agree that what we want to see improvements in patient safety, waiting, outcomes, experience for both patients and staff, and use of our limited NHS resources.

<Pete> Yes. Our common purpose, the “what” and “why”, has never been an issue.  Where we seem to get stuck is the “how”.  We have both tried many things but, despite our good intentions, it feels like things are getting worse!

<Phil> I agree. It may be that what we have implemented has had a positive impact and we would have been even worse off if we had done nothing. But I do not know. We clearly have much to learn and, while I believe we are making progress, we do not appear to be learning fast enough.  And I think this knowledge gap exposes another “how” issue: After we have intervened, how do we know that we have (a) improved, (b) not changed or (c) worsened?

<Pete> That is a very good question.  And all that I have to offer as an answer is to share what we do in medicine when we ask a similar question: “How do I know that treatment A is better than treatment B?”  It is the essence of medical research; the quest to find better treatments that deliver better outcomes and at lower cost.  The similarities are strong.

<Phil> OK. How do you do that? How do you know that “Treatment A is better than Treatment B” in a way that anyone will trust the answer?

 <Pete> We use a science that is actually very recent on the scientific timeline; it was only firmly established in the first half of the 20th century. One reason for that is that it is rather a counter-intuitive science and for that reason it requires using tools that have been designed and demonstrated to work but which most of us do not really understand how they work. They are a bit like magic black boxes.

<Phil> H’mm. Please forgive me for sounding skeptical but that sounds like a big opportunity for making mistakes! If there are lots of these “magic black box” tools then how do you decide which one to use and how do you know you have used it correctly?

<Pete> Those are good questions! Very often we don’t know and in our collective confusion we generate a lot of unproductive discussion.  This is why we are often forced to accept the advice of experts but, I confess, very often we don’t understand what they are saying either! They seem like the medieval Magi.

<Phil> H’mm. So these experts are like ‘magicians’ – they claim to understand the inner workings of the black magic boxes but are unable, or unwilling, to explain in a language that a ‘muggle’ would understand?

<Pete> Very well put. That is just how it feels.

<Phil> So can you explain what you do understand about this magical process? That would be a start.

<Pete> OK, I will do my best.  The first thing we learn in medical research is that we need to be clear about what it is we are looking to improve, and we need to be able to measure it objectively and accurately.

<Phil> That  makes sense. Let us say we want to improve the patient’s subjective quality of the A&E experience and objectively we want to reduce the time they spend in A&E. We measure how long they wait. 

<Pete> The next thing is that we need to decide how much improvement we need. What would be worthwhile? So in the example you have offered we know that reducing the average time patients spend in A&E by just 30 minutes would have a significant effect on the quality of the patient and staff experience, and as a by-product it would also dramatically improve the 4-hour target performance.

<Phil> OK.  From the commissioning perspective there are lots of things we can do, such as commissioning alternative paths for specific groups of patients; in effect diverting some of the unscheduled demand away from A&E to a more appropriate service provider.  But these are the sorts of thing we have been experimenting with for years, and it brings us back to the question: How do we know that any change we implement has had the impact we intended? The system seems, well, complicated.

<Pete> In medical research we are very aware that the system we are changing is very complicated and that we do not have the power of omniscience.  We cannot know everything.  Realistically, all we can do is to focus on objective outcomes and collect small samples of the data ocean and use those in an attempt to draw conclusions can trust. We have to design our experiment with care!

<Phil> That makes sense. Surely we just need to measure the stuff that will tell us if our impact matches our intent. That sounds easy enough. What’s the problem?

<Pete> The problem we encounter is that when we measure “stuff” we observe patient-to-patient variation, and that is before we have made any changes.  Any impact that we may have is obscured by this “noise”.

<Phil> Ah, I see.  So if the our intervention generates a small impact then it will be more difficult to see amidst this background noise. Like trying to see fine detail in a fuzzy picture.

<Pete> Yes, exactly like that.  And it raises the issue of “errors”.  In medical research we talk about two different types of error; we make the first type of error when our actual impact is zero but we conclude from our data that we have made a difference; and we make the second type of error when we have made an impact but we conclude from our data that we have not.

<Phil> OK. So does that imply that the more “noise” we observe in our measure for-improvement before we make the change, the more likely we are to make one or other error?

<Pete> Precisely! So before we do the experiment we need to design it so that we reduce the probability of making both of these errors to an acceptably low level.  So that we can be assured that any conclusion we draw can be trusted.

<Phil> OK. So how exactly do you do that?

<Pete> We know that whenever there is “noise” and whenever we use samples then there will always be some risk of making one or other of the two types of error.  So we need to set a threshold for both. We have to state clearly how much confidence we need in our conclusion. For example, we often use the convention that we are willing to accept a 1 in 20 chance of making the Type I error.

<Phil> Let me check if I have heard you correctly. Suppose that, in reality, our change has no impact and we have set the risk threshold for a Type 1 error at 1 in 20, and suppose we repeat the same experiment 100 times – are you saying that we should expect about five of our experiments to show data that says our change has had the intended impact when in reality it has not?

<Pete> Yes. That is exactly it.

<Phil> OK.  But in practice we cannot repeat the experiment 100 times, so we just have to accept the 1 in 20 chance that we will make a Type 1 error, and we won’t know we have made it if we do. That feels a bit chancy. So why don’t we just set the threshold to 1 in 100 or 1 in 1000?

<Pete> We could, but doing that has a consequence.  If we reduce the risk of making a Type I error by setting our threshold lower, then we will increase the risk of making a Type II error.

<Phil> Ah! I see. The old swings-and-roundabouts problem. By the way, do these two errors have different names that would make it  easier to remember and to explain?

<Pete> Yes. The Type I error is called a False Positive. It is like concluding that a patient has a specific diagnosis when in reality they do not.

<Phil> And the Type II error is called a False Negative?

<Pete> Yes.  And we want to avoid both of them, and to do that we have to specify a separate risk threshold for each error.  The convention is to call the threshold for the false positive the alpha level, and the threshold for the false negative the beta level.

<Phil> OK. So now we have three things we need to be clear on before we can do our experiment: the size of the change that we need, the risk of the false positive that we are willing to accept, and the risk of a false negative that we are willing to accept.  Is that all we need?

<Pete> In medical research we learn that we need six pieces of the experimental design jigsaw before we can proceed. We only have three pieces so far.

<Phil> What are the other three pieces then?

<Pete> We need to know the average value of the metric we are intending to improve, because that is our baseline from which improvement is measured.  Improvements are often framed as a percentage improvement over the baseline.  And we need to know the spread of the data around that average, the “noise” that we referred to earlier.

<Phil> Ah, yes!  I forgot about the noise.  But that is only five pieces of the jigsaw. What is the last piece?

<Pete> The size of the sample.

<Phil> Eh?  Can’t we just go with whatever data we can realistically get?

<Pete> Sadly, no.  The size of the sample is how we control the risk of a false negative error.  The more data we have the lower the risk. This is referred to as the power of the experimental design.

<Phil> OK. That feels familiar. I know that the more experience I have of something the better my judgement gets. Is this the same thing?

<Pete> Yes. Exactly the same thing.

<Phil> OK. So let me see if I have got this. To know if the impact of the intervention matches our intention we need to design our experiment carefully. We need all six pieces of the experimental design jigsaw and they must all fall inside our circle of control. We can measure the baseline average and spread; we can specify the impact we will accept as useful; we can specify the risks we are prepared to accept of making the false positive and false negative errors; and we can collect the required amount of data after we have made the intervention so that we can trust our conclusion.

<Pete> Perfect! That is how we are taught to design research studies so that we can trust our results, and so that others can trust them too.

<Phil> So how do we decide how big the post-implementation data sample needs to be? I can see we need to collect enough data to avoid a false negative but we have to be pragmatic too. There would appear to be little value in collecting more data than we need. It would cost more and could delay knowing the answer to our question.

<Pete> That is precisely the trap than many inexperienced medical researchers fall into. They set their sample size according to what is achievable and affordable, and then they hope for the best!

<Phil> Well, we do the same. We analyse the data we have and we hope for the best.  In the magical metaphor we are asking our data analysts to pull a white rabbit out of the hat.  It sounds rather irrational and unpredictable when described like that! Have medical researchers learned a way to avoid this trap?

<Pete> Yes, it is a tool called a power calculator.

<Phil> Ooooo … a power tool … I like the sound of that … that would be a cool tool to have in our commissioning bag of tricks. It would be like a magic wand. Do you have such a thing?

<Pete> Yes.

<Phil> And do you understand how the power tool magic works well enough to explain to a “muggle”?

<Pete> Not really. To do that means learning some rather unfamiliar language and some rather counter-intuitive concepts.

<Phil> Is that the magical stuff I hear lurks between the covers of a medical statistics textbook?

<Pete> Yes. Scary looking mathematical symbols and unfathomable spells!

<Phil> Oh dear!  Is there another way for to gain a working understanding of this magic? Something a bit more pragmatic? A path that a ‘statistical muggle’ might be able to follow?

<Pete> Yes. It is called a simulator.

<Phil> You mean like a flight simulator that pilots use to learn how to control a jumbo jet before ever taking a real one out for a trip?

<Pete> Exactly like that.

<Phil> Do you have one?

<Pete> Yes. It was how I learned about this “stuff” … pragmatically.

<Phil> Can you show me?

<Pete> Of course.  But to do that we will need a bit more time, another coffee, and maybe a couple of those tasty looking Danish pastries.

<Phil> A wise investment I’d say.  I’ll get the the coffee and pastries, if you fire up the engines of the simulator.

stick_figure_superhero_anim_150_wht_1857Have you heard the phrase “Pride comes before a fall“?

What does this mean? That the feeling of pride is the reason for the subsequent fall?

So by following that causal logic, if we do not allow ourselves to feel proud then we can avoid the fall?

And none of us like the feeling of falling and failing. We are fearful of that negative feeling, so with this simple trick we can avoid feeling bad. Yes?

But we all know the positive feeling of achievement – we feel pride when we have done good work, when our impact matches our intent.  Pride in our work.

Is that bad too?

Should we accept under-achievement and unexceptional mediocrity as the inevitable cost of avoiding the pain of possible failure?  Is that what we are being told to do here?

The phrase comes from the Bible, from the Book of Proverbs 16:18 to be precise.


And the problem here is that the phrase “pride comes before a fall” is not the whole proverb.

It has been simplified. Some bits have been omitted. And those omissions lead to ambiguity and the opportunity for obfuscation and re-interpretation.

In the fuller New International Version we see a missing bit … the “haughty spirit” bit.  That is another way of saying “over-confident” or “arrogant”.

But even this “authorised” version is still ambiguous and more questions spring to mind:

Q1. What sort of pride are we referring to? Just the confidence version? What about the pride that follows achievement?

Q2. How would we know if our feeling of confidence is actually justified?

Q3. Does a feeling of confidence always precede a fall? Is that how we diagnose over-confidence? Retrospectively? Are there instances when we feel confident but we do not fail? Are there instances when we do not feel confident and then fail?

Q4. Does confidence cause the fall or it is just a temporal association? Is there something more fundamental that causes both high-confidence and low-competence?

There is a well known model called the Conscious-Competence model of learning which generates a sequence of four stages to achieving a new skill. Such as one we need to achieve our intended outcomes.

We all start in the “blissful ignorance” zone of unconscious incompetence.  Our unknowns are unknown to us.  They are blind spots.  So we feel unjustifiably confident.


In this model the first barrier to progress is “wrong intuition” which means that we actually have unconscious assumptions that are distorting our perception of reality.

What we perceive makes sense to us. It is clear and obvious. We feel confident. We believe our own rhetoric.

But our unconscious assumptions can trick us into interpreting information incorrectly.  And if we derive decisions from unverified assumptions and invalid analysis then we may do the wrong thing and not achieve our intended outcome.  We may unintentionally cause ourselves to fail and not be aware of it.  But we are proud and confident.

Then the gap between our intent and our impact becomes visible to all and painful to us. So we are tempted to avoid the social pain of public failure by retreating behind the “Yes, But” smokescreen of defensive reasoning. The “doom loop” as it is sometimes called. The Victim Vortex. “Don’t name, shame and blame me, I was doing my best. I did not intent that to happen. To err is human”.

The good news is that this learning model also signposts a possible way out; a door in the black curtain of ignorance.  It suggests that we can learn how to correct our analysis by using feedback from reality to verify our rhetorical assumptions.  Those assumptions which pass the “reality check” we keep, those which fail the “reality check” we redesign and retest until they pass.  Bit by bit our inner rhetoric comes to more closely match reality and the wisdom of our decisions will improve.

And what we then see is improvement.  Our impact moves closer towards our intent. And we can justifiably feel proud of that achievement. We do not need to be best-compared-with-the-rest; just being better-than-we-were-before is OK. That is learning.


And this is how it feels … this is the Learning Curve … or the Nerve Curve as we call it.

What it says is that to be able to assess confidence we must also measure competence. Outcomes. Impact.

And to achieve excellence we have to be prepared to actively look for any gap between intent and impact.  And we have to be prepared to see it as an opportunity rather than as a threat. And we will need to be able to seek feedback and other people’s perspectives. And we need to be to open to asking for examples and explanations from those who have demonstrated competence.

It says that confidence is not a trustworthy surrogate for competence.

It says that we want the confidence that flows from competence because that is the foundation of trust.

Improvement flows at the speed of trust and seeing competence, confidence and trust growing is a joyous thing.

Pride and Joy are OK.

Arrogance and incompetence comes before a fall would be a better proverb.


About 25 years ago a paper was published in the Harvard Business Review with the interesting title of “Teaching Smart People How To Learn

The uncomfortable message was that many people who are top of the intellectual rankings are actually very poor learners.

This sounds like a paradox.  How can people be high-achievers and yet be unable to learn?

Health care systems are stuffed full of super-smart, high-achieving professionals. The cream of educational crop. The top 2%. They are called “doctors”.

And we have a problem with improvement in health care … a big problem … the safety, delivery, quality and affordability of the NHS is getting worse. Not better.

Improvement implies change and change implies learning, so if smart people struggle to learn then could that explain why health care systems find self-improvement so difficult?

This paragraph from the 1991 HBR paper feels uncomfortably familiar:


The author, Chris Argyris, refers to something called “single-loop learning” and if we translate this management-speak into the language of medicine it would come out as “treating the symptom and ignoring the disease“.  That is poor medicine.

Chris also suggests an antidote to this problem and gave it the label “double-loop learning” which if translated into medical speak becomes “diagnosis“.  And that is something that doctors can relate to because without a diagnosis, a justifiable treatment is difficult to formulate.

We need to diagnose the root cause(s) of the NHS disease.

The 1991 HBR paper refers back to an earlier 1977 HBR paper called Double Loop Learning in Organisations where we find the theory that underpins it.

The proposed hypothesis is that we all have cognitive models that we use to decide our actions (and in-actions), what I have referred to before as ChimpWare.  In it is a reference to a table published in a 1974 book and the message is that Single-Loop learning is a manifestation of a Model 1 theory-in-action.


And if we consider the task that doctors are expected to do then we can empathize with their dominant Model 1 approach.  Health care is a dangerous business.  Doctors can cause a lot of unintentional harm – both physical and psychological.  Doctors are dealing with a very, very complex system – a human body – that they only partially understand.  No two patients are exactly the same and illness is a dynamic process.  Everyone’s expectations are high. We have come a long way since the days of blood-letting and leeches!  Failure is not tolerated.

Doctors are intelligent and competitive … they had to be to win the education race.

Doctors must make tough decisions and have to have tough conversations … many, many times … and yet not be consumed in the process.  They often have to suppress emotions to be effective.

Doctors feel the need to protect patients from harm – both physical and emotional.

And collectively they do a very good job.  Doctors are respected and trusted professionals.

But …  to quote Chris Argyris …

“Model I blinds people to their weaknesses. For instance, the six corporate presidents were unable to realize how incapable they were of questioning their assumptions and breaking through to fresh understanding. They were under the illusion that they could learn, when in reality they just kept running around the same track.”

This blindness is self-reinforcing because …

“All parties withheld information that was potentially threatening to themselves or to others, and the act of cover-up itself was closed to discussion.”

How many times have we seen this in the NHS?

The Mid-Staffordshire Hospital debacle that led to the Francis Report is all the evidence we need.

So what is the way out of this double-bind?

Chris gives us some hints with his Model II theory-in-use.

  1. Valid information – Study.
  2. Free and informed choice – Plan.
  3. Constant monitoring of the implementation – Do.

The skill required is to question assumptions and break through to fresh understanding and we can do that with design-led approach because that is what designers do.

They bring their unconscious assumptions up to awareness and ask “Is that valid?” and “What if” questions.

It is called Improvement-by-Design.

And the good news is that this Model II approach works in health care, and we know that because the evidence is accumulating.


thinker_figure_unsolve_puzzle_150_wht_18309Many of the challenges that we face in delivering effective and affordable health care do not have well understood and generally accepted solutions.

If they did there would be no discussion or debate about what to do and the results would speak for themselves.

This lack of understanding is leading us to try to solve a complicated system design challenge in our heads.  Intuitively.

And trying to do it this way is fraught with frustration and risk because our intuition tricks us. It was this sort of challenge that led Professor Rubik to invent his famous 3D Magic Cube puzzle.

It is difficult enough to learn how to solve the Magic Cube puzzle by trial and error; it is even more difficult to attempt to do it inside our heads! Intuitively.

And we know the Rubik Cube puzzle is solvable, so all we need are some techniques, tools and training to improve our Rubik Cube solving capability.  We can all learn how to do it.

Returning to the challenge of safe and affordable health care, and to the specific problem of unscheduled care, A&E targets, delayed transfers of care (DTOC), finance, fragmentation and chronic frustration.

This is a systems engineering challenge so we need some systems engineering techniques, tools and training before attempting it.  Not after failing repeatedly.


One technique that a systems engineer will use is called a Vee Diagram such as the one shown above.  It shows the sequence of steps in the generic problem solving process and it has the same sequence that we use in medicine for solving problems that patients present to us …

Diagnose, Design and Deliver

which is also known as …

Study, Plan, Do.

Notice that there are three words in the diagram that start with the letter V … value, verify and validate.  These are probably the three most important words in the vocabulary of a systems engineer.

One tool that a systems engineer always uses is a model of the system under consideration.

Models come in many forms from conceptual to physical and are used in two main ways:

  1. To assist the understanding of the past (diagnosis)
  2. To predict the behaviour in the future (prognosis)

And the process of creating a system model, the sequence of steps, is shown in the Vee Diagram.  The systems engineer’s objective is a validated model that can be trusted to make good-enough predictions; ones that support making wiser decisions of which design options to implement, and which not to.

So if a systems engineer presented us with a conceptual model that is intended to assist our understanding, then we will require some evidence that all stages of the Vee Diagram process have been completed.  Evidence that provides assurance that the model predictions can be trusted.  And the scope over which they can be trusted.

Last month a report was published by the Nuffield Trust that is entitled “Understanding patient flow in hospitals”  and it asserts that traffic flow on a motorway is a valid conceptual model of patient flow through a hospital.  Here is a direct quote from the second paragraph in the Executive Summary:

Unfortunately, no evidence is provided in the report to support the validity of the statement and that omission should ring an alarm bell.

The observation that “the hospitals with the least free space struggle the most” is not a validation of the conceptual model.  Validation requires a concrete experiment.

To illustrate why observation is not validation let us consider a scenario where I have a headache and I take a paracetamol and my headache goes away.  I now have some evidence that shows a temporal association between what I did (take paracetamol) and what I got (a reduction in head pain).

But this is not a valid experiment because I have not considered the other seven possible combinations of headache before (Y/N), paracetamol (Y/N) and headache after (Y/N).

An association cannot be used to prove causation; not even a temporal association.

When I do not understand the cause, and I am without evidence from a well-designed experiment, then I might be tempted to intuitively jump to the (invalid) conclusion that “headaches are caused by lack of paracetamol!” and if untested this invalid judgement may persist and even become a belief.

Understanding causality requires an approach called counterfactual analysis; otherwise known as “What if?” And we can start that process with a thought experiment using our rhetorical model.  But we must remember that we must always validate the outcome with a real experiment. That is how good science works.

A famous thought experiment was conducted by Albert Einstein when he asked the question “If I were sitting on a light beam and moving at the speed of light what would I see?” This question led him to the Theory of Relativity which completely changed the way we now think about space and time.  Einstein’s model has been repeatedly validated by careful experiment, and has allowed engineers to design and deliver valuable tools such as the Global Positioning System which uses relativity theory to achieve high positional precision and accuracy.

So let us conduct a thought experiment to explore the ‘faster movement requires more space‘ statement in the case of patient flow in a hospital.

First, we need to define what we mean by the words we are using.

The phrase ‘faster movement’ is ambiguous.  Does it mean higher flow (more patients per day being admitted and discharged) or does it mean shorter length of stage (the interval between the admission and discharge events for individual patients)?

The phrase ‘more space’ is also ambiguous. In a hospital that implies physical space i.e. floor-space that may be occupied by corridors, chairs, cubicles, trolleys, and beds.  So are we actually referring to flow-space or storage-space?

What we have in this over-simplified statement is the conflation of two concepts: flow-capacity and space-capacity. They are different things. They have different units. And the result of conflating them is meaningless and confusing.

However, our stated goal is to improve understanding so let us consider one combination, and let us be careful to be more precise with our terminology, “higher flow always requires more beds“. Does it? Can we disprove this assertion with an example where higher flow required less beds (i.e. space-capacity)?

The relationship between flow and space-capacity is well understood.

The starting point is Little’s Law which was proven mathematically in 1961 by J.D.C. Little and it states:

Average work in progress = Average lead time  X  Average flow.

In the hospital context, work in progress is the number of occupied beds, lead time is the length of stay and flow is admissions or discharges per time interval (which must be the same on average over a long period of time).

(NB. Engineers are rather pedantic about units so let us check that this makes sense: the unit of WIP is ‘patients’, the unit of lead time is ‘days’, and the unit of flow is ‘patients per day’ so ‘patients’ = ‘days’ * ‘patients / day’. Correct. Verified. Tick.)

So, is there a situation where flow can increase and WIP can decrease? Yes. When lead time decreases. Little’s Law says that is possible. We have disproved the assertion.

Let us take the other interpretation of higher flow as shorter length of stay: i.e. shorter length of stay always requires more beds.  Is this correct? No. If flow remains the same then Little’s Law states that we will require fewer beds. This assertion is disproved as well.

And we need to remember that Little’s Law is proven to be valid for averages, does that shed any light on the source of our confusion? Could the assertion about flow and beds actually be about the variation in flow over time and not about the average flow?

And this is also well understood. The original work on it was done almost exactly 100 years ago by Agner Arup Erlang and the problem he looked at was the quality of customer service of the early telephone exchanges. Specifically, how likely was the caller to get the “all lines are busy, please try later” response.

What Erlang showed was there there is a mathematical relationship between the number of calls being made (the demand), the probability of a call being connected first time (the service quality) and the number of telephone circuits and switchboard operators available (the service cost).

So it appears that we already have a validated mathematical model that links flow, quality and cost that we might use if we substitute ‘patients’ for ‘calls’, ‘beds’ for ‘telephone circuits’, and ‘being connected’ for ‘being admitted’.

And this topic of patient flow, A&E performance and Erlang queues has been explored already … here.

So a telephone exchange is a more valid model of a hospital than a motorway.

We are now making progress in deepening our understanding.

The use of an invalid, untested, conceptual model is sloppy systems engineering.

So if the engineering is sloppy we would be unwise to fully trust the conclusions.

And I share this feedback in the spirit of black box thinking because I believe that there are some valuable lessons to be learned here – by us all.

reading_a_book_pa_150_wht_3136An effective way to improve is to learn from others who have demonstrated the capability to achieve what we seek.  To learn from success.

Another effective way to improve is to learn from those who are not succeeding … to learn from failures … and that means … to learn from our own failings.

But from an early age we are socially programmed with a fear of failure.

The training starts at school where failure is not tolerated, nor is challenging the given dogma.  Paradoxically, the effect of our fear of failure is that our ability to inquire, experiment, learn, adapt, and to be resilient to change is severely impaired!

So further failure in the future becomes more likely, not less likely. Oops!

Fortunately, we can develop a healthier attitude to failure and we can learn how to harness the gap between intent and impact as a source of energy, creativity, innovation, experimentation, learning, improvement and growing success.

And health care provides us with ample opportunities to explore this unfamiliar terrain. The creative domain of the designer and engineer.

The scatter plot below is a snapshot of the A&E 4 hr target yield for all NHS Trusts in England for the month of July 2016.  The required “constitutional” performance requirement is better than 95%.  The delivered whole system average is 85%.  The majority of Trusts are failing, and the Trust-to-Trust variation is rather wide. Oops!

This stark picture of the gap between intent (95%) and impact (85%) prompts some uncomfortable questions:

Q1: How can one Trust achieve 98% and yet another can do no better than 64%?

Q2: What can all Trusts learn from these high and low flying outliers?

[NB. I have not asked the question “Who should we blame for the failures?” because the name-shame-blame-game is also a predictable consequence of our fear-of-failure mindset.]

Let us dig a bit deeper into the information mine, and as we do that we need to be aware of a trap:

A snapshot-in-time tells us very little about how the system and the set of interconnected parts is behaving-over-time.

We need to examine the time-series charts of the outliers, just as we would ask for the temperature, blood pressure and heart rate charts of our patients.

Here are the last six years by month A&E 4 hr charts for a sample of the high-fliers. They are all slightly different and we get the impression that the lower two are struggling more to stay aloft more than the upper two … especially in winter.

And here are the last six years by month A&E 4 hr charts for a sample of the low-fliers.  The Mark I Eyeball Test results are clear … these swans are falling out of the sky!

So we need to generate some testable hypotheses to explain these visible differences, and then we need to examine the available evidence to test them.

One hypothesis is “rising demand”.  It says that “the reason our A&E is failing is because demand on A&E is rising“.

Another hypothesis is “slow flow”.  It says that “the reason our A&E is failing is because of the slow flow through the hospital because of delayed transfers of care (DTOCs)“.

So, if these hypotheses account for the behaviour we are observing then we would predict that the “high fliers” are (a) diverting A&E arrivals elsewhere, and (b) reducing admissions to free up beds to hold the DTOCs.

Let us look at the freely available data for the highest flyer … the green dot on the scatter gram … code-named “RC9”.

The top chart is the A&E arrivals per month.

The middle chart is the A&E 4 hr target yield per month.

The bottom chart is the emergency admissions per month.

Both arrivals and admissions are increasing, while the A&E 4 hr target yield is rock steady!

And arranging the charts this way allows us to see the temporal patterns more easily (and the images are deliberately arranged to show the overall pattern-over-time).

Patterns like the change-for-the-better that appears in the middle of the winter of 2013 (i.e. when many other trusts were complaining that their sagging A&E performance was caused by “winter pressures”).

The objective evidence seems to disprove the “rising demand”, “slow flow” and “winter pressure” hypotheses!

So what can we learn from our failure to adequately explain the reality we are seeing?

The trust code-named “RC9” is Luton and Dunstable, and it is an average district general hospital, on the surface.  So to reveal some clues about what actually happened there, we need to read their Annual Report for 2013-14.  It is a public document and it can be downloaded here.

This is just a snippet …

… and there are lots more knowledge nuggets like this in there …

… it is a treasure trove of well-known examples of good system flow design.

The results speak for themselves!

Q: How many black swans does it take to disprove the hypothesis that “all swans are white”.

A: Just one.

“RC9” is a black swan. An outlier. A positive deviant. “RC9” has disproved the “impossibility” hypothesis.

And there is another flock of black swans living in the North East … in the Newcastle area … so the “Big cities are different” hypothesis does not hold water either.

The challenge here is a human one.  A human factor.  Our learned fear of failure.

Learning-how-to-fail is the way to avoid failing-how-to-learn.

And to read more about that radical idea I strongly recommend reading the recently published book called Black Box Thinking by Matthew Syed.

It starts with a powerful story about the impact of human factors in health care … and here is a short video of Martin Bromiley describing what happened.

The “black box” that both Martin and Matthew refer to is the one that is used in air accident investigations to learn from what happened, and to use that learning to design safer aviation systems.

Martin Bromiley has founded a charity to support the promotion of human factors in clinical training, the Clinical Human Factors Group.

So if we can muster the courage and humility to learn how to do this in health care for patient safety, then we can also learn to how do it for flow, quality and productivity.

Our black swan called “RC9” has demonstrated that this goal is attainable.

And the body of knowledge needed to do this already exists … it is called Health and Social Care Systems Engineering (HSCSE).

Postscript: And I am pleased to share that Luton & Dunstable features in the House of Commons Health Committee report entitled Winter Pressures in A&E Departments that was published on 3rd Nov 2016.

Here is part of what L&D shared to explain their deviant performance:


These points describe rather well the essential elements of a pull design, which is the antidote to the rather more prevalent pressure cooker design.

database_transferring_data_150_wht_10400It has been a busy week.

And a common theme has cropped up which I have attempted to capture in the diagram below.

It relates to how the NHS measures itself and how it “drives” improvement.

The measures are called “failure metrics” – mortality, infections, pressure sores, waiting time breaches, falls, complaints, budget overspends.  The list is long.

The data for a specific trust are compared with an arbitrary minimum acceptable standard to decide where the organisation is on the Red-Amber-Green scale.

If we are in the red zone on the RAG chart … we get a kick.  If not we don’t.

The fear of being bullied and beaten raises the emotional temperature and the internal pressure … which drives movement to get away from the pain.  A nematode worm will behave this way. They are not stupid either.

As as we approach the target line our RAG indicator turns “amber” … this is the “not statistically significant zone” … and now the stick is being waggled, ready in case the light goes red again.

So we muster our reserves of emotional energy and we PUSH until our RAG chart light goes green … but then we have to hold it there … which is exhausting.  One pain is replaced by another.

The next step is for the population of NHS nematodes to be compared with each other … they must be “bench-marked”, and some are doing better than others … as we might expect. We have done our “sadistics” training courses.

The bottom 5% or 10% line is used to set the “arbitrary minimum standard target” … and the top 10% are feted at national award ceremonies … and feast on the envy of the other 90 or 95% of “losers”.

The Cream of the Crop now have a big tick in their mission statement objectives box “To be in the Top 10% of Trusts in the UK“.  Hip hip huzzah.

And what has this system design actually achieved? The Cream of the Crap.


It is said that every system is perfectly designed to deliver what it delivers.

And a system that has been designed to only use failure and fear to push improvement can only ever achieve chronic mediocrity – either chaotic mediocrity or complacent mediocrity.

So, if we actually do want to tap into the vast zone of unfulfilled potential, and if we do actually want to escape the perpetual pain of the Cream of the Crap Trap forever … we need a better system design.

So we need some system engineers to help us do that.

And this week I met some … at the Royal Academy of Engineering in London … and it felt like finding a candle of hope in the darkness of despair.

I said it had been a busy week!

radar_screen_anim_300_clr_11649The most useful tool that a busy operational manager can have is a reliable and responsive early warning system (EWS).

One that alerts when something is changing and that, if missed or ignored, will cause a big headache in the future.

Rather like the radar system on an aircraft that beeps if something else is approaching … like another aircraft or the ground!

Operational managers are responsible for delivering stuff on time.  So they need a radar that tells them if they are going to deliver-on-time … or not.

And their on-time-delivery EWS needs to alert them soon enough that they have time to diagnose the ‘threat’, design effective plans to avoid it, decide which plan to use, and deliver it.

So what might an effective EWS for a busy operational manager look like?

  1. It needs to be reliable. No missed threats or false alarms.
  2. It needs to be visible. No tomes of text and tables of numbers.
  3. It needs to be simple. Easy to learn and quick to use.

And what is on offer at the moment?

The RAG Chart
This is a table that is coloured red, amber and green. Red means ‘failing’, green means ‘not failing’ and amber means ‘not sure’.  So this meets the specification of visible and simple, but it is reliable?

It appears not.  RAG charts do not appear to have helped to solve the problem.

A RAG chart is generated using historic data … so it tells us where we are now, not how we got here, where we are going or what else is heading our way.  It is a snapshot. One frame from the movie.  Better than complete blindness perhaps, but not much.

The SPC Chart
This is a statistical process control chart and is a more complicated beast.  It is a chart of how some measure of performance has changed over time in the past.  So like the RAG chart it is generated using historic data.  The advantage is that it is not just a snapshot of where were are now, it is a picture of story of how we got to where we are, so it offers the promise of pointing to where we may be heading.  It meets the specification of visible, and while more complicated than a RAG chart, it is relatively easy to learn and quick to use.

Luton_A&E_4Hr_YieldHere is an example. It is the SPC  chart of the monthly A&E 4-hour target yield performance of an acute NHS Trust.  The blue lines are the ‘required’ range (95% to 100%), the green line is the average and the red lines are a measure of variation over time.  What this charts says is: “This hospital’s A&E 4-hour target yield performance is currently acceptable, has been so since April 2012, and is improving over time.”

So that is much more helpful than a RAG chart (which in this case would have been green every month because the average was above the minimum acceptable level).

So why haven’t SPC charts replaced RAG charts in every NHS Trust Board Report?

Could there be a fly-in-the-ointment?

The answer is “Yes” … there is.

SPC charts are a quality audit tool.  They were designed nearly 100 years ago for monitoring the output quality of a process that is already delivering to specification (like the one above).  They are designed to alert the operator to early signals of deterioration, called ‘assignable cause signals’, and they prompt the operator to pay closer attention and to investigate plausible causes.

SPC charts are not designed for predicting if there is a flow problem looming over the horizon.  They are not designed for flow metrics that exhibit expected cyclical patterns.  They are not designed for monitoring metrics that have very skewed distributions (such as length of stay).  They are not designed for metrics where small shifts generate big cumulative effects.  They are not designed for metrics that change more slowly than the frequency of measurement.

And these are exactly the sorts of metrics that a busy operational manager needs to monitor, in reality, and in real-time.

Demand and activity both show strong cyclical patterns.

Lead-times (e.g. length of stay) are often very skewed by variation in case-mix and task-priority.

Waiting lists are like bank accounts … they show the cumulative sum of the difference between inflow and outflow.  That simple fact invalidates the use of the SPC chart.

Small shifts in demand, activity, income and expenditure can lead to big cumulative effects.

So if we abandon our RAG charts and we replace them with SPC charts … then we climb out of the RAG frying pan and fall into the SPC fire.

Oops!  No wonder the operational managers and financial controllers have not embraced SPC.

So is there an alternative that works better?  A more reliable EWS that busy operational managers and financial controllers can use?

Yes, there is, and here is a clue …

… but tread carefully …

… building one of these Flow-Productivity Early Warning Systems is not as obvious as it might first appear.  There are counter-intuitive traps for the unwary and the untrained.

You may need the assistance of a health care systems engineer (HCSE).

Portsmouth_News_20160609We form emotional attachments to places where we have lived and worked.  And it catches our attention when we see them in the news.

So this headline caught my eye, because I was a surgical SHO in Portsmouth in the closing years of the Second Millennium.  The good old days when we still did 1:2 on call rotas (i.e. up to 104 hours per week) and we were paid 70% LESS for the on call hours than the Mon-Fri 9-5 work.  We also had stable ‘firms’, superhuman senior registrars, a canteen that served hot food and strong coffee around the clock, and doctors mess parties that were … well … messy!  A lot has changed.  And not all for the better.

Here is the link to the fuller story about the emergency failures.

And from it we get the impression that this is a recent problem.  And with a bit of a smack and some name-shame-blame-game feedback from the CQC, then all will be restored to robust health. H’mm. I am not so sure that is the full story.

Portsmouth_A&E_4Hr_YieldHere is the monthly aggregate A&E 4-hour target performance chart for Portsmouth from 2010 to date.

It says “this is not a new problem“.

It also says that the ‘patient’ has been deteriorating spasmodically over six years and is now critically-ill.

And giving a critically-ill hospital a “good telling off” is about as effective as telling a critically-ill patient to “pull themselves together“.  Inept management.

In A&E a critically-ill patient requires competent resuscitation using a tried-and-tested process of ABC.  Airway, Breathing, Circulation.

Also, the A&E 4-hour performance is only a symptom of the sickness in the whole urgent care system.  It is the reading on an emotometer inserted into the A&E orifice of the acute hospital!  Just one piece in a much bigger flow jigsaw.

It only tells us the degree of distress … not the diagnosis … nor the required treatment.

So what level of A&E health can we realistically expect to be able to achieve? What is possible in the current climate of austerity? Just how chilled-out can the A&E cucumber run?


This is the corresponding A&E emotometer chart for a different district general hospital somewhere else in NHS England.

Luton & Dunstable Hospital to be specific.

This A&E happiness chart looks a lot healthier and it seems to be getting even healthier over time too.  So this is possible.

Yes, but … if our hospital deteriorates enough to be put on the ‘critical list’ then we need to call in an Emergency Care Intensive Support Team (ECIST) to resuscitate us.

Kettering_A&E_4Hr_YieldA very good idea.

And how do their critically-ill patients fare?

Here is the chart of one of them. The significant improvement following the ‘resuscitation’ is impressive to be sure!

But, disappointingly, it was not sustained and the patient ‘crashed’ again. Perhaps they were just too poorly? Perhaps the first resuscitation call was sent out too late? But at least they tried their best.

An experienced clinician might comment: Those are indeed a plausible explanations, but before we conclude that is the actual cause, can I check that we did not just treat the symptoms and miss the disease?

Q: So is it actually possible to resuscitate and repair a sick hospital?  Is it possible to restore it to sustained health, by diagnosing and treating the cause, and not just the symptoms?

Monklands_A&E_4Hr_YieldHere is the corresponding A&E emotometer chart of yet another hospital.

It shows the same pattern of deteriorating health. And it shows a dramatic improvement.  It appears to have responded to some form of intervention.

And this time the significant improvement has sustained. The patient did not crash-and-burn again.

So what has happened here that explains this different picture?

This hospital had enough insight and humility to seek the assistance of someone who knew what to do and who had a proven track record of doing it.  Dr Kate Silvester to be specific.  A dual-trained doctor and manufacturing systems engineer.

Dr Kate is now a health care systems engineer (HCSE), and an experienced ‘hospital doctor’.

Dr Kate helped them to learn how to diagnose the root causes of their A&E 4-hr fever, and then she showed them how to design an effective treatment plan.

They did the re-design; they tested it; and they delivered their new design. Because they owned it, they understood it, and they trusted their own diagnosis-and-design competence.

And the evidence of their impact matching their intent speaks for itself.

A few weeks ago I raised the undiscussable issue that the NHS feels like it is on a downward trajectory … and that what might be needed are some better engines … and to design, test, build and install them we will need some health care system engineers (HCSEs) … and that we do not have appear to have enough of those. None in fact.

The feedback shows that many people resonated with this sentiment.

This week I had the opportunity to peek inside the NHS Cockpit and look at the Dashboard … and this is what I saw on the A&E Performance panel.


This is the monthly aggregate A&E 4-hour performance for England (red), Scotland (purple), Wales (brown) and Northern Ireland (grey) for the last six years.

The trajectory looked alarmingly obvious to me – the NHS is on a predictable path to destruction – a controlled flight into terrain (CFIT).

The repeating up-and-down pattern is the annual cycle of seasons; better in the summer and worse in the winter.  This signal is driven by the celestial clock … the movement of the planets … which is beyond our power to influence.

The downward trajectory is the cumulative effect of our current design … which is the emergent effect of our collective beliefs, behaviours, policies and politics … which are completely within our gift to change.

If we chose to and if we knew how to – which we do not appear to.

Our collective ineptitude is not a topic for discussion. It is a taboo subject.

And I know that because if it were for discussion then this dashboard would be on public view on a website hosted by the NHS.

It isn’t.

George_DonaldIt was created by George Donald, a member of the public, a disappointed patient, and a retired IT consultant.  And it was shared, free for all to see and use via Twitter (@GMDonald).

The information source is open, public, shared NHS data, but it takes a lot of work to winkle it out and present it like this.  So well done George … keep up the great work!

Now have a closer look at the Dashboard Display … look at the most recent data for England and Scotland.  What do you see?

Does it look like Scotland is pulling out of the dive and England is heading down even faster?

Hard to say for sure; there are lots of signals and noise all mixed up.

So we need to use some Systems Engineering tools to help us separate the signals from the noise; and for this a statistical process control (SPC) chart is useless.  We need a system behaviour chart (SBC) and its handy helper the deviation from aim (DFA) chart.

I will not bore you with the technical details but, suffice it to say, it is a tried-and-tested technique called the Method of Residuals.

Scotland_A&E_DFA_02 Exhibit #1 is the DFA chart for Scotland.  The middle 4 years (2011-2014) are used to create a ‘predictive model’;  the model projection is then compared with measured performance; and the difference is plotted as the DFA chart.

What this “says” is that the 2015/16 performance in Scotland is significantly better than projected, and the change of direction seemed to start in the first half of 2015.

This evidence seems to support the results of our Mark I Eyeball test.


Exhibit #2 – the DFA for England suggests the 2015/16 performance is significantly worse than projected, and this deterioration appears to have started later in 2015.

Oh dear! I do not believe that was the intention, but it appears to be the impact.

So what are England and Scotland doing differently?
What can we all learn from this?
What can we all do differently in the future?

Isn’t that a question that more people like you, me and George could reasonably ask of those whom we entrust to design, build and fly our NHS?

Isn’t that a reasonable question that could be asked by the 65 million people in the UK who might, at any time, be unlucky enough to require a trip to their local A&E department.

So, let us all grasp the nettle and get the Elephant in the Room into plain view and say in unison “The Emperor Has No Clothes!”

We are suffering from mass ineptitude and hubris, to use Dr Atul Gawande’s language, and we need a better collective strategy.

And there is hope.

Some innovative hospitals have had the courage to grasp the nettle. They have seen what is coming; they have fully accepted the responsibility for their own fate; they have stepped up to the challenge; they have looked-listened-and-learned from others, and they are proving what is possible.

They have a name. They are called positive deviants.

Have a look at this short video … it is jaw-dropping … it is humbling … it is inspiring … and it is challenging … because it shows what has been achieved already.

It shows what is possible. Now, and here in the UK.

Luton and Dunstable


It has been another interesting week.  A bitter-sweet mixture of disappointment and delight. And the central theme has been ‘transformation’.

The source of disappointment was the newsreel images of picket lines of banner-waving junior doctors standing in the cold watching ambulances deliver emergencies to hospitals now run by consultants.

So what about the thousands of elective appointments and operations that were cancelled to release the consultants? If the NHS was failing elective delivery time targets before it is going to be failing them even more now. And who will pay for the “waiting list initiatives” needed to just catch up? Depressing to watch.

The mercurial Roy Lilley summed up the general mood very well in his newsletter on Thursday, the day after the strike.


What he is saying is we do not have a health care system, we have a sick care system.  Which is the term coined by the acclaimed systems thinker, the late Russell Ackoff (see the video about half way down).

We aspire to a transformation-to-better but we only appear to be able to achieve a transformation-to-worse. That is depressing.

My source of delight was sharing the stories of those who are stepping up and are transforming themselves and their bits of the world; and how they are doing that by helping each other to learn “how to do it” – a small bite at a time.

Here is one excellent example: a diagnostic study looking at the root cause of the waiting time for school-age pupils to receive a health-protecting immunisation.

So what sort of transformation does the NHS need?

A transformation in the way it delivers care by elimination of the fragmentation that is the primary cause of the distrust, queues, waits, frustration, chaos and ever-increasing costs?

A transformation from purposeless and reactive; to purposeful and proactive?

A transformation from the disappointment that flows from the mismatch between intent and impact; to the delight that flows from discovering that there is a way forward; that there is a well understood science that underpins it; and a growing body of evidence that proves its effectiveness.  The Science of Improvement.

In  a recent blog I shared the story of how it is possible to ‘melt queues‘ or more specifically how it is possible to teach anyone, who wants to learn, how to melt queues.

It is possible to do this for an outpatient clinic in one day.

So imagine what could happen if just 1% of consultants decided improve their outpatient clinics using this quick-and-easy-to-learn-and-apply method?  Those courageous and innovative consultants who are not prepared to drown in the  Victim Vortex of despair and cynicism.  And what could happen if they shared their improvement stories with their less optimistic colleagues?  And what could happen if a just a few of them followed the lead of the innovators?

Would that be a small transformation?  Or the start of a much bigger one? Or both?

Chimp_NoHear_NoSee_NoSpeakLast week I shared a link to Dr Don Berwick’s thought provoking presentation at the Healthcare Safety Congress in Sweden.

Near the end of the talk Don recommended six books, and I was reassured that I already had read three of them. Naturally, I was curious to read the other three.

One of the unfamiliar books was “Overcoming Organizational Defenses” by the late Chris Argyris, a professor at Harvard.  I confess that I have tried to read some of his books before, but found them rather difficult to understand.  So I was intrigued that Don was recommending it as an ‘easy read’.  Maybe I am more of a dimwit that I previously believed!  So fear of failure took over my inner-chimp and I prevaricated. I flipped into denial. Who would willingly want to discover the true depth of their dimwittedness!

Later in the week, I was forwarded a copy of a recently published paper that was on a topic closely related to a key thread in Dr Don’s presentation:

understanding variation.

The paper was by researchers who had looked at the Board reports of 30 randomly selected NHS Trusts to examine how information on safety and quality was being shared and used.  They were looking for evidence that the Trust Boards understood the importance of variation and the need to separate ‘signal’ from ‘noise’ before making decisions on actions to improve safety and quality performance.  This was a point Don had stressed too, so there was a link.

The randomly selected Trust Board reports contained 1488 charts, of which only 88 demonstrated the contribution of chance effects (i.e. noise). Of these, 72 showed the Shewhart-style control charts that Don demonstrated. And of these, only 8 stated how the control limits were constructed (which is an essential requirement for the chart to be meaningful and useful).

That is a validity yield of 8 out of 1488, or 0.54%, which is for all practical purposes zero. Oh dear!

This chance combination of apparently independent events got me thinking.

Q1: What is the reason that NHS Trust Boards do not use these signal-and-noise separation techniques when it has been demonstrated, for at least 12 years to my knowledge, that they are very effective for facilitating improvement in healthcare? (e.g. Improving Healthcare with Control Charts by Raymond G. Carey was published in 2003).

Q2: Is there some form of “organizational defense” system in place that prevents NHS Trust Boards from learning useful ‘new’ knowledge?

So I surfed the Web to learn more about Chris Argyris and to explore in greater depth his concept of Single Loop and Double Loop learning.  I was feeling like a dimwit again because to me it is not a very descriptive title!  I suspect it is not to many others too.

I sensed that I needed to translate the concept into the language of healthcare and this is what emerged.

Single Loop learning is like treating the symptoms and ignoring the disease.

Double Loop learning is diagnosing the underlying disease and treating that.

So what are the symptoms?
The pain of NHS Trust  failure on all dimensions – safety, delivery, quality and productivity (i.e. affordability for a not-for-profit enterprise).

And what are the signs?
The tell-tale sign is more subtle. It’s what is not present that is important. A serious omission. The missing bits are valid time-series charts in the Trust Board reports that show clearly what is signal and what is noise. This diagnosis is critical because the strategies for addressing them are quite different – as Julian Simcox eloquently describes in his latest essay.  If we get this wrong and we act on our unwise decision, then we stand a very high chance of making the problem worse, and demoralizing ourselves and our whole workforce in the process! Does that sound familiar?

And what is the disease?
Undiscussables.  Emotive subjects that are too taboo to table in the Board Room.  And the issue of what is discussable is one of the undiscussables so we have a self-sustaining system.  Anyone who attempts to discuss an undiscussable is breaking an unspoken social code.  Another undiscussable is behaviour, and our social code is that we must not upset anyone so we cannot discuss ‘difficult’ issues.  But by avoiding the issue (the undiscussable disease) we fail to address the root cause and end up upsetting everyone.  We achieve exactly what we are striving to avoid, which is the technical definition of incompetence.  And Chris Argyris labelled this as ‘skilled incompetence’.

Does an apparent lack of awareness of what is already possible fully explain why NHS Trust Boards do not use the tried-and-tested tool called a system behaviour chart to help them diagnose, design and deliver effective improvements in safety, flow, quality and productivity?

Or are there other forces at play as well?

Some deeper undiscussables perhaps?

The Harvard Business Review is worth reading because many of its articles challenge deeply held assumptions, and then back up the challenge with the pragmatic experience of those who have succeeded to overcome the limiting beliefs.

So the heading on the April 2016 copy that awaited me on my return from an Easter break caught my eye: YOU CAN’T FIX CULTURE.



The successful leaders of major corporate transformations are agreed … the cultural change follows the technical change … and then the emergent culture sustains the improvement.

The examples presented include the Ford Motor Company, Delta Airlines, Novartis – so these are not corporate small fry!

The evidence suggests that the belief of “we cannot improve until the culture changes” is the mantra of failure of both leadership and management.

A health care system is characterised by a culture of risk avoidance. And for good reason. It is all too easy to harm while trying to heal!  Primum non nocere is a core tenet – first do no harm.

But, change and improvement implies taking risks – and those leaders of successful transformation know that the bigger risk by far is to become paralysed by fear and to do nothing.  Continual learning from many small successes and many small failures is preferable to crisis learning after a catastrophic failure!

The UK healthcare system is in a state of chronic chaos.  The evidence is there for anyone willing to look.  And waiting for the NHS culture to change, or pushing for culture change first appears to be a guaranteed recipe for further failure.

The HBR article suggests that it is better to stay focussed; to work within our circles of control and influence; to learn from others where knowledge is known, and where it is not – to use small, controlled experiments to explore new ground.

And I know this works because I have done it and I have seen it work.  Just by focussing on what is important to every member on the team; focussing on fixing what we could fix; not expecting or waiting for outside help; gathering and sharing the feedback from patients on a continuous basis; and maintaining patient and team safety while learning and experimenting … we have created a micro-culture of high safety, high efficiency, high trust and high productivity.  And we have shared the evidence via JOIS.

The micro-culture required to maintain the safety, flow, quality and productivity improvements emerged and evolved along with the improvements.

It was part of the effect, not the cause.

So the concept of ‘fix the system design flaws and the continual improvement culture will emerge’ seems to work at macro-system and at micro-system levels.

We just need to learn how to diagnose and treat healthcare system design flaws. And that is known knowledge.

So what is the next excuse?  Too busy?

Pearl_and_OysterThe word pearl is a metaphor for something rare, beautiful, and valuable.

Pearls are formed inside the shell of certain mollusks as a defense mechanism against a potentially threatening irritant.

The mollusk creates a pearl sac to seal off the irritation.

And so it is with change and improvement.  The growth of precious pearls of improvement wisdom – the ones that develop slowly over time – are triggered by an irritant.

Someone asking an uncomfortable question perhaps, or presenting some information that implies that an uncomfortable question needs to be asked.

About seven years ago a question was asked “Would improving healthcare flow and quality result in lower costs?”

It is a good question because some believe that it would and some believe that it would not.  So an experiment to test the hypothesis was needed.

The Health Foundation stepped up to the challenge and funded a three year project to find the answer. The design of the experiment was simple. Take two oysters and introduce an irritant into them and see if pearls of wisdom appeared.

The two ‘oysters’ were Sheffield Hospital and Warwick Hospital and the irritant was Dr Kate Silvester who is a doctor and manufacturing system engineer and who has a bit-of-a-reputation for asking uncomfortable questions and backing them up with irrefutable information.

Two rare and precious pearls did indeed grow.

In Sheffield, it was proved that by improving the design of their elderly care process they improved the outcome for their frail, elderly patients.  More went back to their own homes and fewer left via the mortuary.  That was the quality and safety improvement. They also showed a shorter length of stay and a reduction in the number of beds needed to store the work in progress.  That was the flow and productivity improvement.

What was interesting to observe was how difficult it was to get these profoundly important findings published.  It appeared that a further irritant had been created for the academic peer review oyster!

The case study was eventually published in Age and Aging 2014; 43: 472-77.

The pearl that grew around this seed is the Sheffield Microsystems Academy.

In Warwick, it was proved that the A&E 4 hour performance could be improved by focussing on improving the design of the processes within the hospital, downstream of A&E.  For example, a redesign of the phlebotomy and laboratory process to ensure that clinical decisions on a ward round are based on todays blood results.

This specific case study was eventually published as well, but by a different path – one specifically designed for sharing improvement case studies – JOIS 2015; 22:1-30

And the pearls of wisdom that developed as a result of irritating many oysters in the Warwick bed are clearly described by Glen Burley, CEO of Warwick Hospital NHS Trust in this recent video.

Getting the results of all these oyster bed experiments published required irritating the Health Foundation oyster … but a pearl grew there too and emerged as the full Health Foundation report which can be downloaded here.

So if you want to grow a fistful of improvement and a bagful of pearls of wisdom … then you will need to introduce a bit of irritation … and Dr Kate Silvester is a proven source of grit for your oyster!

comparing_information_anim_5545[Bzzzzzz] Bob’s phone vibrated to remind him it was time for the regular ISP remote coaching session with Leslie. He flipped the lid of his laptop just as Leslie joined the virtual meeting.

<Leslie> Hi Bob, and Happy New Year!

<Bob> Hello Leslie and I wish you well in 2016 too.  So, what shall we talk about today?

<Leslie> Well, given the time of year I suppose it should be the Winter Crisis.  The regularly repeating annual winter crisis. The one that feels more like the perpetual winter crisis.

<Bob> OK. What specifically would you like to explore?

<Leslie> Specifically? The habit of comparing of this year with last year to answer the burning question “Are we doing better, the same or worse?”  Especially given the enormous effort and political attention that has been focused on the hot potato of A&E 4-hour performance.

<Bob> Aaaaah! That old chestnut! Two-Points-In-Time comparison.

<Leslie> Yes. I seem to recall you usually add the word ‘meaningless’ to that phrase.

<Bob> H’mm.  Yes.  It can certainly become that, but there is a perfectly good reason why we do this.

<Leslie> Indeed, it is because we see seasonal cycles in the data so we only want to compare the same parts of the seasonal cycle with each other. The apples and oranges thing.

<Bob> Yes, that is part of it. So what do you feel is the problem?

<Leslie> It feels like a lottery!  It feels like whether we appear to be better or worse is just the outcome of a random toss.

<Bob> Ah!  So we are back to the question “Is the variation I am looking at signal or noise?” 

<Leslie> Yes, exactly.

<Bob> And we need a scientifically robust way to answer it. One that we can all trust.

<Leslie> Yes.

<Bob> So how do you decide that now in your improvement work?  How do you do it when you have data that does not show a seasonal cycle?

<Leslie> I plot-the-dots and use an XmR chart to alert me to the presence of the signals I am interested in – especially a change of the mean.

<Bob> Good.  So why can we not use that approach here?

<Leslie> Because the seasonal cycle is usually a big signal and it can swamp the smaller change I am looking for.

<Bob> Exactly so. Which is why we have to abandon the XmR chart and fall back the two points in time comparison?

<Leslie> That is what I see. That is the argument I am presented with and I have no answer.

<Bob> OK. It is important to appreciate that the XmR chart was not designed for doing this.  It was designed for monitoring the output quality of a stable and capable process. It was designed to look for early warning signs; small but significant signals that suggest future problems. The purpose is to alert us so that we can identify the root causes, correct them and the avoid a future problem.

<Leslie> So we are using the wrong tool for the job. I sort of knew that. But surely there must be a better way than a two-points-in-time comparison!

<Bob> There is, but first we need to understand why a TPIT is a poor design.

<Leslie> Excellent. I’m all ears.

<Bob> A two point comparison is looking at the difference between two values, and that difference can be positive, zero or negative.  In fact, it is very unlikely to be zero because noise is always present.

<Leslie> OK.

<Bob> Now, both of the values we are comparing are single samples from two bigger pools of data.  It is the difference between the pools that we are interested in but we only have single samples of each one … so they are not measurements … they are estimates.

<Leslie> So, when we do a TPIT comparison we are looking at the difference between two samples that come from two pools that have inherent variation and may or may not actually be different.

<Bob> Well put.  We give that inherent variation a name … we call it variance … and we can quantify it.

<Leslie> So if we do many TPIT comparisons then they will show variation as well … for two reasons; first because the pools we are sampling have inherent variation; and second just from the process of sampling itself.  It was the first lesson in the ISP-1 course.

<Bob> Well done!  So the question is: “How does the variance of the TPIT sample compare with the variance of the pools that the samples are taken from?”

<Leslie> My intuition tells me that it will be less because we are subtracting.

<Bob> Your intuition is half-right.  The effect of the variation caused by the signal will be less … that is the rationale for the TPIT after all … but the same does not hold for the noise.

<Leslie> So the noise variation in the TPIT is the same?

<Bob> No. It is increased.

<Leslie> What! But that would imply that when we do this we are less likely to be able to detect a change because a small shift in signal will be swamped by the increase in the noise!

<Bob> Precisely.  And the degree that the variance increases by is mathematically predictable … it is increased by a factor of two.

<Leslie> So as we usually present variation as the square root of the variance, to get it into the same units as the metric, then that will be increased by the square root of two … 1.414

<Bob> Yes.

<Leslie> I need to put this counter-intuitive theory to the test!

<Bob> Excellent. Accept nothing on faith. Always test assumptions. And how will you do that?

<Leslie> I will use Excel to generate a big series of normally distributed random numbers; then I will calculate a series of TPIT differences using a fixed time interval; then I will calculate the means and variations of the two sets of data; and then I will compare them.

<Bob> Excellent.  Let us reconvene in ten minutes when you have done that.

10 minutes later …

<Leslie> Hi Bob, OK I am ready and I would like to present the results as charts. Is that OK?

<Bob> Perfect!

<Leslie> Here is the first one.  I used our A&E performance data to give me some context. We know that on Mondays we have an average of 210 arrivals with an approximately normal distribution and a standard deviation of 44; so I used these values to generate the random numbers. Here is the simulated Monday Arrivals chart for two years.


<Bob> OK. It looks stable as we would expect and I see that you have plotted the sigma levels which look to be just under 50 wide.

<Leslie> Yes, it shows that my simulation is working. So next is the chart of the comparison of arrivals for each Monday in Year 2 compared with the corresponding week in Year 1.

TPIT_DifferenceData <Bob> Oooookaaaaay. What have we here?  Another stable chart with a mean of about zero. That is what we would expect given that there has not been a change in the average from Year 1 to Year 2. And the variation has increased … sigma looks to be just over 60.

<Leslie> Yes!  Just as the theory predicted.  And this is not a spurious answer. I ran the simulation dozens of times and the effect is consistent!  So, I am forced by reality to accept the conclusion that when we do two-point-in-time comparisons to eliminate a cyclical signal we will reduce the sensitivity of our test and make it harder to detect other signals.

<Bob> Good work Leslie!  Now that you have demonstrated this to yourself using a carefully designed and conducted simulation experiment, you will be better able to explain it to others.

<Leslie> So how do we avoid this problem?

<Bob> An excellent question and one that I will ask you to ponder on until our next chat.  You know the answer to this … you just need to bring it to conscious awareness.


smack_head_in_disappointment_150_wht_16653The NHS appears to be suffering from some form of obsessive-compulsive disorder.

OCD sufferers feel extreme anxiety in certain situations. Their feelings drive their behaviour which is to reduce the perceived cause of their feelings. It is a self-sustaining system because their perception is distorted and their actions are largely ineffective. So their anxiety is chronic.

Perfectionists demonstrate a degree of obsessive-compulsive behaviour too.

In the NHS the triggers are called ‘targets’ and usually take the form of failure metrics linked to arbitrary performance specifications.

The anxiety is the fear of failure and its unpleasant consequences: the name-shame-blame-game.

So a veritable industry has grown around ways to mitigate the fear. A very expensive and only partially effective industry.

Data is collected, cleaned, manipulated and uploaded to the Mothership (aka NHS England). There it is further manipulated, massaged and aggregated. Then the accumulated numbers are posted on-line, every month for anyone with a web-browser to scrutinise and anyone with an Excel spreadsheet to analyse.

An ocean of measurements is boiled and distilled into a few drops of highly concentrated and sanitized data and, in the process, most of the useful information is filtered out, deleted or distorted.

For example …

One of the failure metrics that sends a shiver of angst through a Chief Operating Officer (COO) is the failure to deliver the first definitive treatment for any patient within 18 weeks of referral from a generalist to a specialist.

The infamous and feared 18-week target.

Service providers, such as hospitals, are actually fined by their Clinical Commissioning Groups (CCGs) for failing to deliver-on-time. Yes, you heard that right … one NHS organisation financially penalises another NHS organisation for failing to deliver a result over which they have only partial control.

Service providers do not control how many patients are referred, or a myriad of other reasons that delay referred patients from attending appointments, tests and treatments. But the service providers are still accountable for the outcome of the whole process.

This ‘Perform-or-Pay-The-Price Policy‘ creates the perfect recipe for a lot of unhappiness for everyone … which is exactly what we hear and what we see.

So what distilled wisdom does the Mothership share? Here is a snapshot …


Q1: How useful is this table of numbers in helping us to diagnose the root causes of long waits, and how does it help us to decide what to change in our design to deliver a shorter waiting time and more productive system?

A1: It is almost completely useless (in this format).

So what actually happens is that the focus of management attention is drawn to the part just before the speed camera takes the snapshot … the bit between 14 and 18 weeks.

Inside that narrow time-window we see a veritable frenzy of target-failure-avoiding behaviour.

Clinical priority is side-lined and management priority takes over.  This is a management emergency! After all, fines-for-failure are only going to make the already bad financial situation even worse!

The outcome of this fire-fighting is that the bigger picture is ignored. The focus is on the ‘whip’ … and avoiding it … because it hurts!

Message from the Mothership:    “Until morale improves the beatings will continue”.

The good news is that the undigestible data liquor does harbour some very useful insights.  All we need to do is to present it in a more palatable format … as pictures of system behaviour over time.

We need to use the data to calculate the work-in-progress (=WIP).

And then we need to plot the WIP in time-order so we can see how the whole system is behaving over time … how it is changing and evolving. It is a dynamic living thing, it has vitality.

So here is the WIP chart using the distilled wisdom from the Mothership.


And this picture does not require a highly trained data analyst or statistician to interpret it for us … a Mark I eyeball linked to 1.3 kg of wetware running ChimpOS 1.0 is enough … and if you are reading this then you must already have that hardware and software.

Two patterns are obvious:

1) A cyclical pattern that appears to have an annual frequency, a seasonal pattern. The WIP is higher in the summer than in the winter. Eh? What is causing that?

2) After an initial rapid fall in 2008 the average level was steady for 4 years … and then after March 2012 it started to rise. Eh? What is causing is that?

The purpose of a WIP chart is to stimulate questions such as:

Q1: What happened in March 2012 that might have triggered this change in system behaviour?

Q2: What other effects could this trigger have caused and is there evidence for them?

A1: In March 2012 the Health and Social Care Act 2012 became Law. In the summer of 2012 the shiny new and untested Clinical Commissioning Groups (CCGs) were authorised to take over the reins from the exiting Primary care Trusts (PCTs) and Strategic Health Authorities (SHAs). The vast £80bn annual pot of tax-payer cash was now in the hands of well-intended GPs who believed that they could do a better commissioning job than non-clinicians. The accountability for outcomes had been deftly delegated to the doctors.  And many of the new CCG managers were the same ones who had collected their redundancy checks when the old system was shut down. Now that sounds like a plausible system-wide change! A massive political experiment was underway and the NHS was the guinea-pig.

A2: Another NHS failure metric is the A&E 4-hour wait target which, worringly, also shows a deterioration that appears to have started just after July 2010, i.e. just after the new Government was elected into power.  Maybe that had something to do with it? Maybe it would have happened whichever party won at the polls.


A plausible temporal association does not constitute proof – and we cannot conclude a political move to a CCG-led NHS has caused the observed behaviour. Retrospective analysis alone is not able to establish the cause.

It could just as easily be that something else caused these behaviours. And it is important to remember that there are usually many causal factors combining together to create the observed effect.

And unraveling that Gordian Knot is the work of analysts, statisticians, economists, historians, academics, politicians and anyone else with an opinion.

We have a more pressing problem. We have a deteriorating NHS that needs urgent resuscitation!

So what can we do?

One thing we can do immediately is to make better use of our data by presenting it in ways that are easier to interpret … such as a work in progress chart.

Doing that will trigger different conversions; ones spiced with more curiosity and laced with less cynicism.

We can add more context to our data to give it life and meaning. We can season it with patient and staff stories to give it emotional impact.

And we can deepen our understanding of what causes lead to what effects.

And with that deeper understanding we can begin to make wiser decisions that will lead to more effective actions and better outcomes.

This is all possible. It is called Improvement Science.

And as we speak there is an experiment running … a free offer to doctors-in-training to learn the foundations of improvement science in healthcare (FISH).

In just two weeks 186 have taken up that offer and 13 have completed the course!

And this vanguard of curious and courageous innovators have discovered a whole new world of opportunity that they were completely unaware of before. But not anymore!

So let us ease off applying the whip and ease in the application of WIP.


Here is a short video describing how to create, animate and interpret a form of diagnostic Vitals Chart® using the raw data published by NHS England.  This is a training exercise from the Improvement Science Practitioner (level 2) course.

How to create an 18 weeks animated Bucket Brigade Chart (BBC)



The blog last week seems to have caused a bit of a stir … so this week we will continue on the same theme.

I’m Dr Bob and I am a hospital doctor: I help to improve the health of poorly hospitals.

And I do that using the Science of Improvement – which is the same as all sciences, there is a method to it.

Over the next few weeks I will outline, in broad terms, how this is done in practice.

And I will use the example of a hospital presenting with pain in their A&E department.  We will call it St.Elsewhere’s ® Hospital … a fictional name for a real patient.

It is a while since I learned science at school … so I thought a bit of a self-refresher would be in order … just to check that nothing fundamental has changed.


This is what I found on page 2 of a current GCSE chemistry textbook.

Note carefully that the process starts with observations; hypotheses come after that; then predictions and finally designing experiments to test them.

The scientific process starts with study.

Which is reassuring because when helping a poorly patient or a poorly hospital that is exactly where we start.

So, first we need to know the symptoms; only then can we start to suggest some hypotheses for what might be causing those symptoms – a differential diagnosis; and then we look for more specific and objective symptoms and signs of those hypothetical causes.

<Dr Bob> What is the presenting symptom?

<StE> “Pain in the A&E Department … or more specifically the pain is being felt by the Executive Department who attribute the source to the A&E Department.  Their pain is that of 4-hour target failure.

<Dr Bob> Are there any other associated symptoms?

<StE> “Yes, a whole constellation.  Complaints from patients and relatives; low staff morale, high staff turnover, high staff sickness, difficulty recruiting new staff, and escalating locum and agency costs. The list is endless.”

<Dr Bob> How long have these symptoms been present?

<StE> “As long as we can remember.”

<Dr Bob> Are the symptoms staying the same, getting worse or getting better?

<StE> “Getting worse. It is worse in the winter and each winter is worse than the last.”

<Dr Bob> And what have you tried to relieve the pain?

<StE> “We have tried everything and anything – business process re-engineering, balanced scorecards, Lean, Six Sigma, True North, Blue Oceans, Golden Hours, Perfect Weeks, Quality Champions, performance management, pleading, podcasts, huddles, cuddles, sticks, carrots, blogs  and even begging. You name it we’ve tried it! The current recommended treatment is to create a swarm of specialist short-stay assessment units – medical, surgical, trauma, elderly, frail elderly just to name a few.” 

<Dr Bob> And how effective have these been?

<StE> “Well some seemed to have limited and temporary success but nothing very spectacular or sustained … and the complexity and cost of our processes just seem to go up and up with each new initiative. It is no surprise that everyone is change weary and cynical.”

The pattern of symptoms is that of a chronic (longstanding) illness that has seasonal variation, which is getting worse over time and the usual remedies are not working.

And it is obvious that we do not have a clear diagnosis; or know if our unclear diagnosis is incorrect; or know if we are actually dealing with an incurable disease.

So first we need to focus on establishing the diagnosis.

And Dr Bob is already drawing up a list of likely candidates … with carveoutosis at the top.

<Dr Bob> Do you have any data on the 4-hour target pain?  Do you measure it?

<StE> “We are awash with data! I can send the quarterly breach performance data for the last ten years!”

<Dr Bob> Excellent, that will be useful as it should confirm that this is a chronic and worsening problem but it does not help establish a diagnosis.  What we need is more recent, daily data. Just the last six months should be enough. Do you have that?

<StE> “Yes, that is how we calculate the quarterly average that we are performance managed on. Here is the spreadsheet. We are ‘required’ to have fewer than 5% 4-hour breaches on average. Or else.”

This is where Dr Bob needs some diagnostic tools.  He needs to see the pain scores presented as  picture … so he can see the pattern over time … because it is a very effective way to generate plausible causal hypotheses.

Dr Bob can do this on paper, or with an Excel spreadsheet, or use a tool specifically designed for the job. He selects his trusted visualisation tool : BaseLine©.


<Dr Bob> This is your A&E pain data plotted as a time-series chart.  At first glance it looks very chaotic … that is shown by the wide and flat histogram. Is that how it feels?

<StE> “That is exactly how it feels … earlier in the year it was unremitting pain and now we have a constant background ache with sharp, severe, unpredictable stabbing pains on top. I’m not sure what is worse!

<Dr Bob> We will need to dig a bit deeper to find the root cause of this chronic pain … we need to identify the diagnosis or diagnoses … and your daily pain data should offer us some clues.

StE_4hr_Pain_Chart_RG_DoWSo I have plotted your data in a different way … grouping by day of the week … and this shows there is a weekly pattern to your pain. It looks worse on Mondays and least bad on Fridays.  Is that your experience?

<StE> “Yes, the beginning of the week is definitely worse … because it is like a perfect storm … more people referred by their GPs on Mondays and the hospital is already full with the weekend backlog of delayed discharges so there are rarely beds to admit new patients into until late in the day. So they wait in A&E.  

Dr Bob’s differential diagnosis is firming up … he still suspects acute-on-chronic carveoutosis as the primary cause but he now has identified an additional complication … Forrester’s Syndrome.

And Dr Bob suspects an unmentioned problem … that the patient has been traumatised by a blunt datamower!

So that is the evidence we will look for next …

This week an interesting report was published by Monitor – about some possible reasons for the A&E debacle that England experienced in the winter of 2014.

Summary At A Glance

“91% of trusts did not  meet the A&E 4-hour maximum waiting time standard last winter – this was the worst performance in 10 years”.

So it seems a bit odd that the very detailed econometric analysis and the testing of “Ten Hypotheses” did not look at the pattern of change over the previous 10 years … it just compared Oct-Dec 2014 with the same period for 2013! And the conclusion: “Hospitals were fuller in 2014“.  H’mm.

The data needed to look back 10 years is readily available on the various NHS England websites … so here it is plotted as simple time-series charts.  These are called system behaviour charts or SBCs. Our trusted analysis tools will be a Mark I Eyeball connected to the 1.3 kg of wetware between our ears that runs ChimpOS 1.0 …  and we will look back 11 years to 2004.

A&E_Arrivals_2004-15First we have the A&E Arrivals chart … about 3.4 million arrivals per quarter. The annual cycle is obvious … higher in the summer and falling in the winter. And when we compare the first five years with the last six years there has been a small increase of about 5% and that seems to associate with a change of political direction in 2010.

So over 11 years the average A&E demand has gone up … a bit … but only by about 5%.

A&E_Admissions_2004-15In stark contrast the A&E arrivals that are admitted to hospital has risen relentlessly over the same 11 year period by about 50% … that is about 5% per annum … ten times the increase in arrivals … and with no obvious step in 2010. We can see the annual cycle too.  It is a like a ratchet. Click click click.

But that does not make sense. Where are these extra admissions going to? We can only conclude that over 11 years we have progressively added more places to admit A&E patients into.  More space-capacity to store admitted patients … so we can stop the 4-hour clock perhaps? More emergency assessment units perhaps? Places to wait with the clock turned off perhaps? The charts imply that our threshold for emergency admission has been falling: Admission has become increasingly the ‘easier option’ for whatever reason.  So why is this happening? Do more patients need to be admitted?

In a recent empirical study we asked elderly patients about their experience of the emergency process … and we asked them just after they had been discharged … when it was still fresh in their memories. A worrying pattern emerged. Many said that they had been admitted despite them saying they did not want to be.  In other words they did not willingly consent to admission … they were coerced.

This is anecdotal data so, by implication, it is wholly worthless … yes?  Perhaps from a statistical perspective but not from an emotional one.  It is a red petticoat being waved that should not be ignored.  Blissful ignorance comes from ignoring anecdotal stuff like this. Emotionally uncomfortable anecdotal stories. Ignore the early warning signs and suffer the potentially catastrophic consequences.

A&E_Breaches_2004-15And here is the corresponding A&E 4-hour Target Failure chart.  Up to 2010 the imposed target was 98% success (i.e. 2% acceptable failure) and, after bit of “encouragement” in 2004-5, this was actually achieved in some of the summer months (when the A&E demand was highest remember).

But with a change of political direction in 2010 the “hated” 4-hour target was diluted down to 95% … so a 5% failure rate was now ‘acceptable’ politically, operationally … and clinically.

So it is no huge surprise that this is what was achieved … for a while at least.

In the period 2010-13 the primary care trusts (PCTs) were dissolved and replaced by clinical commissioning groups (CCGs) … the doctors were handed the ignition keys to the juggernaut that was already heading towards the cliff.

The charts suggest that the seeds were already well sown by 2010 for an evolving catastrophe that peaked last year; and the changes in 2010 and 2013 may have just pressed the accelerator pedal a bit harder. And if the trend continues it will be even worse this coming winter. Worse for patients and worse for staff and worse for commissioners and  worse for politicians. Lose lose lose lose.

So to summarise the data from the NHS England’s own website:

1. A&E arrivals have gone up 5% over 11 years.
2. Admissions from A&E have gone up 50% over 11 years.
3. Since lowering the threshold for acceptable A&E performance from 98% to 95% the system has become unstable and “fallen off the cliff” … but remember, a temporal association does not prove causation.

So what has triggered the developing catastrophe?

Well, it is important to appreciate that when a patient is admitted to hospital it represents an increase in workload for every part of the system that supports the flow through the hospital … not just the beds.  Beds represent space-capacity. They are just where patients are stored.  We are talking about flow-capacity; and that means people, consumables, equipment, data and cash.

So if we increase emergency admissions by 50% then, if nothing else changes, we will need to increase the flow-capacity by 50% and the space-capacity to store the work-in-progress by 50% too. This is called Little’s Law. It is a mathematically proven Law of Flow Physics. It is not negotiable.

So have we increased our flow-capacity and our space-capacity (and our costs) by 50%? I don’t know. That data is not so easy to trawl from the websites. It will be there though … somewhere.

What we have seen is an increase in bed occupancy (the red box on Monitor’s graphic above) … but not a 50% increase … that is impossible if the occupancy is already over 85%.  A hospital is like a rigid metal box … it cannot easily expand to accommodate a growing queue … so the inevitable result in an increase in the ‘pressure’ inside.  We have created an emergency care pressure cooker. Well lots of them actually.

And that is exactly what the staff who work inside hospitals says it feels like.

And eventually the relentless pressure and daily hammering causes the system to start to weaken and fail, gradually at first then catastrophically … which is exactly what the NHS England data charts are showing.

So what is the solution?  More beds?

Nope.  More beds will create more space and that will relieve the pressure … for a while … but it will not address the root cause of why we are admitting 50% more patients than we used to; and why we seem to need to increase the pressure inside our hospitals to squeeze the patients through the process and extrude them out of the various exit nozzles.

Those are the questions we need to have understandable and actionable answers to.

Q1: Why are we admitting 5% more of the same A&E arrivals each year rather than delivering what they need in 4 hours or less and returning them home? That is what the patients are asking for.

Q2: Why do we have to push patients through the in-hospital process rather than pulling them through? The staff are willing to work but not inside a pressure cooker.

A more sensible improvement strategy is to look at the flow processes within the hospital and ensure that all the steps and stages are pulling together to the agreed goals and plan for each patient. The clinical management plan that was decided when the patient was first seen in A&E. The intended outcome for each patient and the shortest and quickest path to achieving it.

Our target is not just a departure within 4 hours of arriving in A&E … it is a competent diagnosis (study) and an actionable clinical management plan (plan) within 4 hours of arriving; and then a process that is designed to deliver (do) it … for every patient. Right, first time, on time, in full and at a cost we can afford.

Q: Do we have that?
A: Nope.

Q: Is that within our gift to deliver?
A: Yup.

Q: So what is the reason we are not already doing it?
A: Good question.  Who in the NHS is trained how to do system-wide flow design like this?

by Julian Simcox & Terry Weight

Ben Goldacre has spent several years popularizing the idea that we all ought all to be more interested in science.

Every day he writes and tweets examples of “bad science”, and about getting politicians and civil servants to be more evidence-based; about how governmental interventions should be more thoroughly tested before being rolled-out to the hapless citizen; about how the development and testing of new drugs should be more transparent to ensure the public get drugs that actually make a difference rather than risk harm; and about bad statistics – the kind that “make clever people do stupid things”(8).

Like Ben we would like to point the public sector, in particular the healthcare sector and its professionals, toward practical ways of doing more of the good kind of science, but just what is GOOD science?

In collaboration with the Cabinet Office’s behaviour insights team, Ben has recently published a polemic (9) advocating evidence-based government policy. For us this too is commendable, yet there is a potentially grave error of omission in their paper which seems to fixate upon just a single method of research, and risks setting-up the unsuspecting healthcare professional for failure and disappointment – as Abraham Maslow once famously said

.. it is tempting, if the only tool you have is a hammer, to treat everything as if it were a nail”(17)

We question the need for the new Test, Learn and Adapt (TLA) model he offers because the NHS already possesses such a model – one which in our experience is more complete and often simpler to follow – it is called the “Improvement Model”(15) – and via its P-D-S-A mnemonic (Plan-Do-Study-Act) embodies the scientific method.

Moreover there is a preexisting wealth of experience on how best to embed this thinking within organisations – from top-to-bottom and importantly from bottom-to-top; experience that has been accumulating for fully nine decades – and though originally established in industrial settings has long since spread to services.

We are this week publishing two papers, one longer and one shorter, in which we start by defining science, ruing the dismal way in which it is perennially conveyed to children and students, the majority of whom leave formal education without understanding the power of discovery or gaining any first hand experience of the scientific method.

View Shorter Version Abstract

We argue that if science were to be defined around discovery, and learning cycles, and built upon observation, measurement and the accumulation of evidence – then good science could vitally be viewed as a process rather than merely as an externalized entity. These things comprise the very essence of what Don Berwick refers to as Improvement Science (2) as embodied by the Institute of Healthcare Improvement (IHI) and in the NHS’s Model for Improvement.

We also aim to bring an evolutionary perspective to the whole idea of science, arguing that its time has been coming for five centuries, yet is only now more fully arriving. We suggest that in a world where many at school have been turned-off science, the propensity to be scientific in our daily lives – and at work – makes a vast difference to the way people think about outcomes and their achievement. This is especially so if those who take a perverse pride in saying they avoided science at school, or who freely admit they do not do numbers, can get switched on to it.

The NHS Model for Improvement has a pedigree originating with Walter Shewhart in the 1920’s, then being famously applied by Deming and Juran after WWII. Deming in particular encapsulates the scientific method in his P-D-C-A model (three decades later he revised it to P-D-S-A in order to emphasize that the Check stage must not be short-changed) – his pragmatic way of enabling a learning/improvement to evolve bottom-up in organisations.

After the 1980’s Dr Don Berwick , standing on these shoulders, then applied the same thinking to the world of healthcare – initially in his native America. Berwick’s approach is to encourage people to ask questions such as: What works? .. and How would we know? His method, is founded upon a culture of evidence-based learning, providing a local context for systemic improvement efforts. A new organisational culture, one rooted in the science of improvement, if properly nurtured, may then emerge.

Yet, such a culture may initially jar with the everyday life of a conventional organisation, and the individuals within it. One of several reasons, according to Yuval Harari (21), is that for hundreds of generations our species has evolved such that imagined reality has been lorded over objective reality. Only relatively recently in our evolution has the advance of science been leveling up this imbalance, and in our papers we argue that a method is now needed that enables these two realities to more easily coexist.

We suggest that a method that enables data-rich evidence-based storytelling – by those who most know about the context and intend growing their collective knowledge – will provide the basis for an approach whereby the two realities may do just that.

In people’s working lives, a vital enabler is the 3-paradigm “Accountability/Improvement/Research” measurement model (AIRmm), reflecting the three archetypal ways in which people observe and measure things. It was created by healthcare professionals (23) to help their colleagues and policy-makers to unravel a commonly prevailing confusion, and to help people make better sense of the different approaches they may adopt when needing to evidence what they’re doing – depending on the specific purpose. An amended version of this model is already widely quoted inside the NHS, though this is not to imply that it is yet as widely understood or applied as it needs to be.


This 3-paradigm A-I-R measurement model underpins the way that science can be applied by, and has practical appeal for, the stretched healthcare professional, managerial leader, civil servant.

Indeed for anyone who intuitively suspects there has to be a better way to combine goals that currently feel disconnected or even in conflict: empowerment and accountability; safety and productivity; assurance and improvement; compliance and change; extrinsic and intrinsic motivation; evidence and action; facts and ideas; logic and values; etc.

Indeed for anyone who is searching for ways to unify their actions with the system-based implementation of those actions as systemic interventions. Though widely quoted in other guises, we are returning to the original model (23) because we feel it better connects to the primary aim of supporting healthcare professionals make best sense of their measurement options.

In particular the model makes it immediately plain that a way out of the apparent Research/Accountability dichotomy is readily available to anyone willing to “Learn, master and apply the modern methods of quality control, quality improvement and quality planning” – the recommendation made for all staff in the Berwick Report (3).

In many organisations, and not just in healthcare, the column 1 paradigm is the only game in town. Column 3 may feel attractive as a way-out, but it also feels inaccessible unless there is a graduate in statistician on hand. Moreover, the mainstay of the Column 3 worldview: the Randomized Controlled Trial (RCT) can feel altogether overblown and lacking in immediacy. It can feel like reaching for a spanner and finding a lump hammer in your hand – as Berwick says “Fans of traditional research methods view RCTs as the gold standard, but RCTs do not work well in many healthcare contexts” (2).

Like us, Ben is frustrated by the ways that healthcare organisations conduct themselves – not just the drug companies that commercialize science and publish only the studies likely to enhance sales, but governments too who commonly implement politically expedient policies only to then have to subsequently invent evidence to support them.

Policy-based evidence rather than evidence-based policy.

Ben’s recommended Column 3-style T-L-A approach is often more likely to make day-to-day sense to people and teams on the ground if complemented by Column 2-style improvement science.
One reason why Improvement Science can sometimes fail to dent established cultures is that it gets corralled by organisational “experts” – some of whom then use what little knowledge they have gathered merely to make themselves indispensable, not realising the extent to which everyone else as a consequence gets dis-empowered.

In our papers we take the opportunity to outline the philosophical underpinnings, and to do this we have borrowed the 7-point framework from a recent paper by Perla et al (35) who suggest that Improvement Science:

1. Is grounded in testing and learning cycles – the aim is collective knowledge and understanding about cause & effect over time. Some scientific method is needed, together with a way to make the necessary inquiry a collaborative one. Shewhart realised this and so invented the concept “continual improvement”.

2. Embraces a combination of psychology and logic – systemic learning requires that we balance myth and received wisdom with logic and the conclusions we derive from rational inquiry. This balance is approximated by the Sensing-Intuiting continuum in the Jungian-based MBTI model (12) reminding us that constructing a valid story requires bandwidth.

3. Has a philosophical foundation of conceptualistic pragmatism (16) – it cannot be expected that two scientists when observing, experiencing, or experimenting will make the same theory-neutral observations about the same event – even if there is prior agreement about methods of inference and interpretation. The normative nature of reality therefore has to be accommodated. Whereas positivism ultimately reduces the relation between meaning and experience to a matter of logical form, pragmatism allows us to ground meaning in conceived experience.

4. Employs Shewhart’s “theory of cause systems” – Walter Shewhart created the Control Chart for tuning-in to systemic behaviour that would otherwise remain unnoticed. It is a diagnostic tool, but by flagging potential trouble also aids real time prognosis. It might have been called a “self-control chart” for he was especially interested in supporting people working in and on their system being more considered (less reactive) when taking action to enhance it without over-reacting – avoiding what Deming later referred to as “Tampering” (4).

5. Requires the use of Operational Definitions – Deming warned that some of the most important aspects of a system cannot be expressed numerically, and those that can require care because “there is no true value of anything measured or observed” (5). When it comes to metric selection therefore it is essential to understand the measurement process itself, as well as the “operational definition” that each metric depends upon – the aim being to reduce ambiguity to zero.

6. Considers the contexts of both justification and discovery – Science can be defined as a process of discovery – testing and learning cycles built upon observation, measurement and accumulating evidence or experience – shared for example via a Flow Chart or a Gantt chart in order to justify a belief in the truth of an assertion. To be worthy of the term “science” therefore, a method or procedure is needed that is characterised by collaborative inquiry.

7. Is informed by Systems Theory – Systems Theory is the study of systems, any system: as small as a quark or as large as the universe. It aims to uncover archetypal behaviours and the principles by which systems hang together – behaviours that can be applied across all disciplines and all fields of research. There are several types of systems thinking, but Jay Forrester’s “System Dynamics” has most pertinence to Improvement Science because of its focus on flows and relationships – recognising that the behaviour of the whole may not be explained by the behaviour of the parts.

In the papers, we say more about this philosophical framing, and we also refer to the four elements in Deming’s “System of Profound Knowledge”(5). We especially want to underscore that the overall aim of any scientific method we employ is contextualised knowledge – which is all the more powerful if continually generated in context-specific experimental cycles. Deming showed that good science requires a theory of knowledge based upon ever-better questions and hypotheses. We two aim now to develop methods for building knowledge-full narratives that can work well in healthcare settings.

We wholeheartedly agree with Ben that for the public sector – not just in healthcare – policy-making needs to become more evidence-based.

In a poignant blog from the Health Foundation’s (HF) Richard Taunt (24), he recently describes attending two recent conferences on the same day. At the first one, policymakers from 25 countries had assembled to discuss how national policy can best enhance the quality of health care. When collectively asked which policies they would retain and repeat, their list included: use of data, building quality improvement capability, ensuring senior management are aware of improvement approaches, and supporting and spreading innovations.

In a different part of London, UK health politicians happened also to be debating Health and Care in order to establish the policy areas they would focus on if forming the next government. This second discussion brought out a completely different set of areas: the role of competition, workforce numbers, funding, and devolution of commissioning. These two discussions were supposedly about the same topic, but a Venn diagram would have contained next to no overlap.

Clare Allcock, also from the HF, then blogged to comment that “in England, we may think we are fairly advanced in terms of policy levers, but (unlike, for example in Scotland or the USA) we don’t even have a strategy for implementing health system quality.” She points in particular to Denmark who recently have announced they are phasing out their hospital accreditation scheme in favour of an approach strongly focused around quality improvement methodology and person-centred care. The Danes are in effect taking the 3-paradigm model and creating space for Column 2: improvement thinking.

The UK needs to take a leaf out of their book, for without changing fundamentally the way the NHS (and the public sector as a whole) thinks about accountability, any attempt to make column 2 the dominant paradigm is destined to be still born.

It is worth noting that in large part the AIRmm Column 2 paradigm was actually central to the 2012 White Paper’s values, and with it the subsequent Outcomes Framework consultation – both of which repeatedly used the phrase “bottom-up” to refer to how the new system of accountability would need to work, but somehow this seems to have become lost in legislative procedures that history will come to regard as having been overly ambitious. The need for a new paradigm of accountability however remains – and without it health workers and clinicians – and the managers who support them – will continue to view metrics more as something intrusive than as something that can support them in delivering enhancements in sustained outcomes. In our view the Stevens’ Five Year Forward View makes this new kind of accountability an imperative.

“Society, in general, and leaders and opinion formers, in particular, (including national and local media, national and local politicians of all parties, and commentators) have a crucial role to play in shaping a positive culture that, building on these strengths, can realise the full potential of the NHS.
When people find themselves working in a culture that avoids a predisposition to blame, eschews naïeve or mechanistic targets, and appreciates the pressures that can accumulate under resource constraints, they can avoid the fear, opacity, and denial that will almost inevitably lead to harm.”
Berwick Report (3)

Changing cultures means changing our habits – it starts with us. It won’t be easy because people default to the familiar, to more of the same. Hospitals are easier to build than relationships; operations are easier to measure than knowledge, skills and confidence; and prescribing is easier than enabling. The two of us do not of course possess a monopoly on all possible solutions, but our experience tells us that now is the time for: evidence-rich storytelling by front line teams; by pharmaceutical development teams; by patients and carers conversing jointly with their physicians.

We know that measurement is not a magic bullet, but what frightens us is that the majority of people seem content to avoid it altogether. As Oliver Moody recently noted in The Times ..

Call it innumeracy, magical thinking or intrinsic mental laziness, but even intelligent members of the public struggle, through no fault of their own, to deal with statistics and probability. This is a problem. People put inordinate amounts of trust in politicians, chief executives, football managers and pundits whose judgment is often little better than that of a psychic octopus.     Short of making all schoolchildren study applied mathematics to A level, the only thing scientists can do about this is stick to their results and tell more persuasive stories about them.

Too often, Disraeli’s infamous words: “Lies, damned lies, and statistics” are used as the refuge of busy professionals looking for an excuse to avoid numbers.

If Improvement Science is to become a shared language, Berwick’s recommendation that all NHS staff “Learn, master and apply the modern methods of quality control, quality improvement and quality planning” has to be taken seriously.

As a first step we recommend enabling teams to access good data in as near to real time as possible, data that indicates the impact that one’s intervention is having – this alone can prompt a dramatic shift in the type of conversation that people working in and on their system may have. Often this can be initiated simply by converting existing KPI data into System Behaviour Chart form which, using a tool like BaseLine® takes only a few mouse clicks.

In our longer paper we offer three examples of Improvement Science in action – combining to illustrate how data may be used to evidence both sustained systemic enhancement, and to generate engagement by the people most directly connected to what in real time is systemically occurring.

1. A surgical team using existing knowledge established by column 3-type research as a platform for column 2-type analytic study – to radically reduce post-operative surgical site infection (SSI).

2. 25 GP practices are required to collect data via the Friends & Family Test (FFT) and decide to experiment with being more than merely compliant. In two practices they collectively pilot a system run by their PPG (patient participation group) to study the FFT score – patient by patient – as they arrive each day. They use IS principles to separate signal from noise in a way that prompts the most useful response to the feedback in near to real time. Separately they summarise all the comments as a whole and feed their analysis into the bi-monthly PPG meeting. The aim is to address both “special cause” feedback and “common cause” feedback in a way that, in what most feel is an over-loaded system, can prompt sensibly prioritised improvement activity.

3. A patient is diagnosed with NAFLD and receives advice from their doctor to get more exercise e.g. by walking more. The patient uses the principles of IS to monitor what happens – using the data not just to show how they are complying with their doctor’s advice, but to understand what drives their personal mind/body system. The patient hopes that this knowledge can lead them to better decision-making and sustained motivation.

The landscape of NHS improvement and innovation support is fragmented, cluttered, and currently pretty confusing. Since May 2013 Academic Health Science Networks (AHSNs) funded by NHS England (NHSE) have been created with the aim of bringing together health services, and academic and industry members. Their stated purpose is to improve patient outcomes and generate economic benefits for the UK by promoting and encouraging the adoption of innovation in healthcare. They have a 5 year remit and have spent the first 2 years establishing their structures and recruiting, it is not yet clear if they will be able to deliver what’s really needed.

Patient Safety Collaboratives linked with AHSN areas have also been established to improve the safety of patients and ensure continual patient safety learning. The programme, coordinated by NHSE and NHSIQ will provide safety improvements across a range of healthcare settings by tackling the leading causes of avoidable harm to patients. The intention is to empower local patients and healthcare staff to work together to identify safety priorities and develop solutions – implemented and tested within local healthcare organisations, then later shared nationally.

We hope our papers will significantly influence the discussions about how improvement and innovation can assist with these initiatives. In the shorter paper to echo Deming, we even include our own 14 points for how healthcare organisations need to evolve. We will know that we have succeeded if the papers are widely read; if we enlist activists like Ben to the definition of science embodied by Improvement Science; and if we see a tidal wave of improvement science methods being applied across the NHS?

As patient volunteers, we each intend to find ways of contributing in any way that appears genuinely helpful. It is our hope that Improvement Science enables the cultural transformation we have envisioned in our papers and with our case studies. This is what we feel most equipped to help with. When in your sixties it easy to feel that time is short, but maybe people of every age should feel this way? In the words of Francis Bacon, the father of the scientific method.


Download Long Version



smack_head_in_disappointment_150_wht_16653Many organisations proclaim that their mission is to achieve excellence but then proceed to deliver mediocre performance.

Why is this?

It is certainly not from lack of purpose, passion or people.

So the flaw must lie somewhere in the process.

The clue lies in how we measure performance … and to see the collective mindset behind the design of the performance measurement system we just need to examine the key performance indicators or KPIs.

Do they measure failure or success?

Let us look at some from the NHS …. hospital mortality, hospital acquired infections, never events, 4-hour A&E breaches, cancer wait breaches, 18 week breaches, and so on.

In every case the metric reported is a failure metric. Not a success metric.

And the focus of action is getting away from failure.

Damage mitigation, damage limitation and damage compensation.

So we have the answer to our question: we know we are doing a good job when we are not failing.

But are we?

When we are not failing we are not doing a bad job … is that the same as doing a good job?

Q: Does excellence  = not excrement?

A: No. There is something between these extremes.

The succeed-or-fail dichotomy is a distorting simplification created by applying an arbitrary threshold to a continuous measure of performance.

And how, specifically, have we designed our current system to avoid failure?

Usually by imposing an arbitrary target connected to a punitive reaction to failure. Management by fear.

This generates punishment-avoidance and back-covering behaviour which is manifest as a lot of repeated checking and correcting of the inevitable errors that we find.  A lot of extra work that requires extra time and that requires extra money.

So while an arbitrary-target-driven-check-and-correct design may avoid failing on safety, the additional cost may cause us to then fail on financial viability.

Out of the frying pan and into the fire.

No wonder Governance and Finance come into conflict!

And if we do manage to pull off a uneasy compromise … then what level of quality are we achieving?

Studies show that if take a random sample of 100 people from the pool of ‘disappointed by their experience’ and we ask if they are prepared to complain then only 5% will do so.

So if we use complaints as our improvement feedback loop and we react to that and make changes that eliminate these complaints then what do we get? Excellence?


We get what we designed … just good enough to avoid the 5% of complaints but not the 95% of disappointment.

We get mediocrity.

And what do we do then?

We start measuring ‘customer satisfaction’ … which is actually asking the question ‘did your experience meet your expectation?’

And if we find that satisfaction scores are disappointingly low then how do we improve them?

We have two choices: improve the experience or reduce the expectation.

But as we are very busy doing the necessary checking-and-correcting then our path of least resistance to greater satisfaction is … to lower expectations.

And we do that by donning the black hat of the pessimist and we lay out the the risks and dangers.

And by doing that we generate anxiety and fear.  Which was not the intended outcome.

Our mission statement proclaims ‘trusted to achieve excellence’ not ‘designed to deliver mediocrity’.

But mediocrity is what the evidence says we are delivering. Just good enough to avoid a smack from the Regulators.

And if we are honest with ourselves then we are forced to conclude that:

A design that uses failure metrics as the primary feedback loop can achieve no better than mediocrity.

So if we choose  to achieve excellence then we need a better feedback design.

We need a design that uses success metrics as the primary feedback loop and we use failure metrics only in safety critical contexts.

And the ideal people to specify the success metrics are those who feel the benefit directly and immediately … the patients who receive care and the staff who give it.

Ask a patient what they want and they do not say “To be treated in less than 18 weeks”.  In fact I have yet to meet a patient who has even heard of the 18-week target!

A patient will say ‘I want to know what is wrong, what can be done, when it can be done, who will do it, what do I need to do, and what can I expect to be the outcome’.

Do we measure any of that?

Do we measure accuracy of diagnosis? Do we measure use of best evidenced practice? Do we know the possible delivery time (not the actual)? Do we inform patients of what they can expect to happen? Do we know what they can expect to happen? Do we measure outcome for every patient? Do we feed that back continuously and learn from it?


So …. if we choose and commit to delivering excellence then we will need to start measuring-4-success and feeding what we see back to those who deliver the care.

Warts and all.

So that we know when we are doing a good job, and we know where to focus further improvement effort.

And if we abdicate that commitment and choose to deliver mediocrity-by-default then we are the engineers of our own chaos and despair.

We have the choice.

We just need to make it.

beehive_bees_150_wht_12723There is a condition called SFQPosis which is an infection that is transmitted by a vector called an ISP.

The primary symptom of SFQPosis is sudden clarity of vision and a new understanding of how safety, flow, quality and productivity improvements can happen at the same time …

… when they are seen as partners on the same journey.

There are two sorts of ISP … Solitary and Social.

Solitary ISPs infect one person at a time … often without them knowing.  And there is often a long lag time between the infection and the appearance of symptoms. Sometimes years – and often triggered by an apparently unconnected event.

In contrast the Social ISPs will tend to congregate together and spend their time foraging for improvement pollen and nectar and bringing it back to their ‘hive’ to convert into delicious ‘improvement honey’ which once tasted is never forgotten.

It appears that Jeremy Hunt, the Secretary of State for Health, has recently been bitten by an ISP and is now exhibiting the classic symptoms of SFQPosis.

Here is the video of Jeremy describing his symptoms at the recent NHS Confederation Conference. The talk starts at about 4 minutes.

His account suggests that he was bitten while visiting the Virginia Mason Hospital in the USA and on return home then discovered some Improvement hives in the UK … and some of the Solitary ISPs that live in England.

Warwick and Sheffield NHS Trusts are buzzing with ISPs … and the original ISP that infected them was one Kate Silvester.

The repeated message in Jeremy’s speech is that improved safety, quality and productivity can happen at the same time and are within our gift to change – and the essence of achieving that is to focus on flow.

SFQPThe sequence is safety first (eliminate the causes of avoidable harm), then flow second (eliminate the causes of avoidable chaos), then quality (measure both expectation and experience) and then productivity will soar.

And everyone will  benefit.

This is not a zero-sum win-lose game.

So listen for the buzz of the ISPs …. follow it and ask them to show you how … ask them to innoculate you with SFQPosis.

And here is a recent video of Dr Steve Allder, a consultant neurologist and another ISP that Kate infected with SFQPosis a few years ago.  Steve is describing his own experience of learning how to do Improvement-by-Design.

FISH_ISP_eggs_jumpingResistance-to-change is an oft quoted excuse for improvement torpor. The implied sub-message is more like “We would love to change but They are resisting“.

Notice the Us-and-Them language.  This is the observable evidence of an “We‘re OK and They’re Not OK” belief.  And in reality it is this unstated belief and the resulting self-justifying behaviour that is an effective barrier to systemic improvement.

This Us-and-Them language generates cultural friction, erodes trust and erects silos that are effective barriers to the flow of information, of innovation and of learning.  And the inevitable reactive solutions to this Us-versus-Them friction create self-amplifying positive feedback loops that ensure the counter-productive behaviour is sustained.

One tangible manifestation are DRATs: Delusional Ratios and Arbitrary Targets.

So when a plausible, rational and well-evidenced candidate for an alternative approach is discovered then it is a reasonable reaction to grab it and to desperately spray the ‘magic pixie dust’ at everything.

This a recipe for disappointment: because there is no such thing as ‘improvement magic pixie dust’.

The more uncomfortable reality is that the ‘magic’ is the result of a long period of investment in learning and the associated hard work in practising and polishing the techniques and tools.

It may look like magic but is isn’t. That is an illusion.

And some self-styled ‘magicians’ choose to keep their hard-won skills secret … because by sharing them know that they will lose their ‘magic powers’ in a flash of ‘blindingly obvious in hindsight’.

And so the chronic cycle of despair-hope-anger-and-disappointment continues.

System-wide improvement in safety, flow, quality and productivity requires that the benefits of synergism overcome the benefits of antagonism.  This requires two changes to the current hope-and-despair paradigm.  Both are necessary and neither are sufficient alone.

1) The ‘wizards’ (i.e. magic folk) share their secrets.
2) The ‘muggles’ (i.e. non-magic folk) invest the time and effort in learning ‘how-to-do-it’.

The transition to this awareness is uncomfortable so it needs to be managed pro-actively … by being open about the risk … and how to mitigate it.

That is what experienced Practitioners of Improvement Science (and ISP) will do. Be open about the challenged ahead.

And those who desperately want the significant and sustained SFQP improvements; and an end to the chronic chaos; and an end to the gaming; and an end to the hope-and-despair cycle …. just need to choose. Choose to invest and learn the ‘how to’ and be part of the future … or choose to be part of the past.

Improvement science is simple … but it is not intuitively obvious … and so it is not easy to learn.

If it were we would be all doing it.

And it is the behaviour of a wise leader of change to set realistic and mature expectations of the challenges that come with a transition to system-wide improvement.

That is demonstrating the OK-OK behaviour needed for synergy to grow.

GearboxOne of the most rewarding experiences for an improvement science coach is to sense when an individual or team shift up a gear and start to accelerate up their learning curve.

It is like there is a mental gearbox hidden inside them somewhere.  Before they were thrashing themselves by trying to go too fast in a low gear. Noisy, ineffective, inefficient and at high risk of blowing a gasket!

Then, they discover that there is a higher gear … and that to get to it they have to take a risk … depress the emotional clutch, ease back on the gas, slip into neutral, and trust themselves to find the new groove and … click … into the higher gear, and then ease up the power while letting out the clutch.  And then accelerate up the learning  curve.  More effective, more efficient. More productive. More fun.

Organisations appear to behave in much the same way.

Some scream along in the slow-lane … thrashing their employee engine. The majority chug complacently in the middle-lane of mediocrity. A few accelerate past in the fast-lane to excellence.

And they are all driving exactly the same model of car.

So it is not the car that is making the difference … it is the driving.

Those who have studied organisations have observed five cultural “gears”; and which gear an organisation is in most of the time can be diagnosed by listening to the sound of the engine – the conversations of the employees.

If they are muttering “work sucks” then they are in first gear.  The sense of hopelessness, futility, despair and anger consumes all their emotional fuel. Fortunately this is uncommon.

If we mainly hear “my work sucks” then they are in second gear.  The feeling is of helplessness and apathy and the behaviour is Victim-like.  They believe that they cannot solve their own problems … someone else must do it for them or tell them what to do. They grumble a lot.

If the dominant voice is “I’m great but you lot suck” then we are hearing third gear attitudes. The selfishly competitive behaviour of the individualist achiever. The “keep your cards close to your chest” style of dyadic leadership.  The advocate of “it is OK to screw others to get ahead”. They grumble a lot too – about the apathetic bunch.

And those who have studied organisations suggest that about 80% of healthcare organisations are stuck in first, second or third cultural gear.  And we can tell who they are … the lower 80% of the league tables. The ones clamouring for more … of everything.

So how come so many organisations are so stuck? Unable to find fourth gear?

One cause is the design of their feedback loops. Their learning loops.

If an organisation only uses failure as a feedback loop then it is destined to get no more than mediocrity.  Third gear at best, and usually only second.

We all feel disappointment when our experience does not live up to our expectation.  But only the most angry of us will actually do something and complain.  Especially when we have no other choice of provider!

Suppose we are commissioners of healthcare services and we are seeing a rising tide of patient and staff complaints. We want to improve the safety and quality of the services that we are paying for; so we draw up a league table using complaints as feedback fodder and we focus on the worst performing providers … threatening them with dire consequences for being in the bottom 20%.  What happens? Fear of failure motivates them to ‘pull up their socks’ and the number of complaints falls.

Job done?

Unfortunately not.

All we have done is to bully those stuck in first or second gear into thrashing their over-burdened employee engine even harder.  We have not helped anyone find their higher gear. We have hit the target, missed the point, and increased the risk of system failure!

So what about those organisations stuck in third gear?

Well they are ticking their performance boxes, meeting our targets, keeping their noses clean.  Some are just below, and some just above the collective mean of barely acceptable mediocrity.

But expectation is changing.

The 20% who have discovered fourth gear are accelerating ahead and are demonstrating what is possible. And they are raising expectation, increasing the variation of service quality … for the better.

And the other 80% are falling further and further behind; thrashing their tired and demoralised staff harder and harder to keep up.  Complaining increasingly that life is unfair and that they need more, time, money and staff engagement. Eventually their executive head gaskets go “pop” and they fall by the wayside.

Finding cultural fourth gear is possible but it is not easy. There are no short cuts.  We have to work our way up the gears and we have to learn when and how to make smooth transitions from first to second, second to third and then third to fourth.

And when we do that the loudest voice we hear is “We are OK“.

We need to learn how to do a smooth cultural hill start on the steep slope from apathy to excellence.

And we need to constantly listen to the sound of our improvement engine; to learn to understand what it is saying; and learn how and when to change to the next cultural gear.

SFQP_enter_circle_middle_15576For a system to be both effective and efficient the parts need to work in synergy. This requires both alignment and collaboration.

Systems that involve people and processes can exhibit complex behaviour. The rules of engagement also change as individuals learn and evolve their beliefs and their behaviours.

The values and the vision should be more fixed. If the goalposts are obscure or oscillate then confusion and chaos is inevitable.

So why is collaborative alignment so difficult to achieve?

One factor has been mentioned. Lack of a common vision and a constant purpose.

Another factor is distrust of others. Our fear of exploitation, bullying, blame, and ridicule.

Distrust is a learned behaviour. Our natural inclination is trust. We have to learn distrust. We do this by copying trust-eroding behaviours that are displayed by our role models. So when leaders display these behaviours then we assume it is OK to behave that way too.  And we dutifully emulate.

The most common trust eroding behaviour is called discounting.  It is a passive-aggressive habit characterised by repeated acts of omission:  Such as not replying to emails, not sharing information, not offering constructive feedback, not asking for other perspectives, and not challenging disrespectful behaviour.

There are many causal factors that lead to distrust … so there is no one-size-fits-all solution to dissolving it.

One factor is ineptitude.

This is the unwillingness to learn and to use available knowledge for improvement.

It is one of the many manifestations of incompetence.  And it is an error of omission.

Whenever we are unable to solve a problem then we must always consider the possibility that we are inept.  We do not tend to do that.  Instead we prefer to jump to the conclusion that there is no solution or that the solution requires someone else doing something different. Not us.

The impossibility hypothesis is easy to disprove.  If anyone has solved the problem, or a very similar one, and if they can provide evidence of what and how then the problem cannot be impossible to solve.

The someone-else’s-fault hypothesis is trickier because proving it requires us to influence others effectively.  And that is not easy.  So we tend to resort to easier but less effective methods … manipulation, blame, bullying and so on.

A useful way to view this dynamic is as a set of four concentric circles – with us at the centre.

The outermost circle is called the ‘Circle of Ignorance‘. The collection of all the things that we do not know we do not know.

Just inside that is the ‘Circle of Concern‘.  These are things we know about but feel completely powerless to change. Such as the fact that the world turns and the sun rises and falls with predictable regularity.

Inside that is the ‘Circle of Influence‘ and it is a broad and continuous band – the further away the less influence we have; the nearer in the more we can do. This is the zone where most of the conflict and chaos arises.

The innermost is the ‘Circle of Control‘.  This is where we can make changes if we so choose to. And this is where change starts and from where it spreads.

SFQP_enter_circle_middle_15576So if we want system-level improvements in safety, flow, quality and productivity (or cost) then we need to align these four circles. Or rather the gaps in them.

We start with the gaps in our circle of control. The things that we believe we cannot do … but when we try … we discover that we can (and always could).

With this new foundation of conscious competence we can start to build new relationships, develop trust and to better influence others in a win-win-win conversation.

And then we can collaborate to address our common concerns – the ones that require coherent effort. We can agree and achieve our common purpose, vision and goals.

And from there we will be able to explore the unknown opportunities that lie beyond. The ones we cannot see yet.

Nanny_McPheeThere comes a point in every improvement-by-design journey when it is time for the improvement guide to leave.

An experienced coach knows when that time has arrived and the expected departure is in the contract.

The Nanny McPhee Coaching Contract:

“When you need me but do not want me then I have to stay. And when you want me but do not need me then I have to leave.”

The science of improvement can appear like ‘magic’ at first because seemingly impossible simultaneous win-win-win benefits are seen to happen with minimal effort.

It is not magic.  It requires years of training and practice to become a ‘magician’.  So those who have invested in learning the know-how are just catalysts.  When their catalysts-of-change work is done then they must leave to do it elsewhere.

The key to managing this transition is to set this expectation clearly and right at the start; so it does not come as a surprise. And to offer reminders along the way.

And it is important to follow through … when the time is right.

It is not always easy though.

There are three commonly encountered situations that will test the temptation of the guide.

1) When things are going very badly because the coaching contract is being breached; usually by old, habitual, trust-eroding, error-of-omission behaviours such as: not communicating, not sharing learning, and not delivering on commitments. The coach, fearing loss of reputation and face, is tempted to stay longer and to try harder. Often getting angry and frustrated in the process.  This is an error of judgement. If the coaching contract is being persistently breached then the Exit Clause should be activated clearly and cleanly.

2) When things are going OK, it is easy to become complacent and the temptation then is to depart too soon, only to hear later that the solo-flyers “crashed and burned”, because they were not quite ready and could not (or would not) see it.  This is the “need but do not want” part of the Nanny McPhee Coaching Contract.  One role of the ISP coach is to respectfully challenge the assertion that ‘We can do it ourselves‘ … by saying ‘OK, please demonstrate‘.

3) When things are going very well it is tempting to blow the Trumpet of Success too early, attracting the attention of others who will want to take short cuts, to bypass the effort of learning for themselves, and to jump onto someone else’s improvement bus.  The danger here is that they bring their counter-productive, behavioural baggage with them. This can cause the improvement bus to veer off course on the twists and turns of the nerve curve; or grind to a halt on the steeper parts of the learning curve.

An experienced ISP coach will respectfully challenge the individuals and the teams to help them develop their experience, competence and confidence. And just as they start to become too comfortable with having someone to defer to for all decisions, the ISP coach will announce their departure and depart as announced.

This is the “want but do not need” part of the Nanny McPhee Coaching Contract.

And experience teaches us that this mutually respectful behaviour works better.