Archive for the ‘6M Design’ Category

In medical training we have to learn about lots of things. That is one reason why it takes a long time to train a competent and confident clinician.

First, we learn the anatomy (structure) and physiology (function) of the normal, healthy human.

Then we learn about how this amazingly complex system can go wrong.  We learn pathology.  And we do that so that we understand the relationship between the cause (disease) and the effect (symptoms and signs).

Then we learn diagnostics – which is how to work backwards from the effects to the most likely cause(s).

And only then can we learn therapeutics – the design and delivery of a treatment plan that we are confident will relieve the symptoms by curing the disease.


The NHS is an amazingly complex system, and it too can go wrong.  It can exhibit a wide spectrum of symptoms and signs: medical errors, long delays, unhappy patients and staff, and overspent budgets.

But, there is no equivalent training in how to diagnose and treat a sick health care system.  And this is not acceptable, especially given that the knowledge of how to do this is already available.  It is called complex adaptive systems engineering (CASE).


Before the Renaissance, the understanding of how the body works was primitive and it was believed that illness was “God’s Will” so we had to just grin-and-bear.

The Scientific Revolution brought us profound theories, innovative techniques and capability extending tools.  And the impact has been dramatic – those who have access to this live better and longer than ever.  Those who don’t … don’t.

Our current understanding of how health care systems work is, to be blunt, medieval.  The current solutions amount to little more than rune reading, incantations and the prescription of purgatives and leeches.  And the impact is about as effective.

So we need to study the anatomy, physiology and pathology of complex adaptive systems like healthcare.

And just this week a prototype complex system pathology training system was tested …

… and it employed cutting-edge 21st Century technology: Pasta Twizzles.

The specific topic under scrutiny was variation.  A brain-bending concept that is usually relegated to the mystical smoke-and-mirrors world called Sadistics.

But no longer!

The Mists of Jargon and Fog of Formulae were blown away as we switched on the Light of Simulation and went exploring. Empirically. Pragmatically.


And what we discovered was jaw-dropping.

A disease called the “Flaw of Averages” and its malignant manifestation “Carveoutosis“.


And with our new knowledge we opened the door to a hidden world of opportunity and improvement.

Then we directed the Laser of Insight and evaporated the queues and chaos that, before our new understanding, we had accepted as inevitable and beyond our understanding or control.

They were neither. And never had been. We were deluding ourselves.

Welcome to the Primary Care Access One Day Workshop.

Validation Test: Passed.

A story was shared this week.

A story of hope for the hard-pressed NHS, its patients, its staff and its managers and its leaders.

A story that says “We can learn how to fix the NHS ourselves“.

And the story comes with evidence; hard, objective, scientific, statistically significant evidence.


The story starts almost exactly three years ago when a Clinical Commissioning Group (CCG) in England made a bold strategic decision to invest in improvement, or as they termed it “Achieving Clinical Excellence” (ACE).

They invited proposals from their local practices with the “carrot” of enough funding to allow GPs to carve-out protected time to do the work.  And a handful of proposals were selected and financially supported.

This is the story of one of those proposals which came from three practices in Sutton who chose to work together on a common problem – the unplanned hospital admissions in their over 70’s.

Their objective was clear and measurable: “To reduce the cost of unplanned admissions in the 70+ age group by working with hospital to reduce length of stay.

Did they achieve their objective?

Yes, they did.  But there is more to this story than that.  Much more.


One innovative step they took was to invest in learning how to diagnose why the current ‘system’ was costing what it was; then learning how to design an improvement; and then learning how to deliver that improvement.

They invested in developing their own improvement science skills first.

They did not assume they already knew how to do this and they engaged an experienced health care systems engineer (HCSE) to show them how to do it (i.e. not to do it for them).

Another innovative step was to create a blog to make it easier to share what they were learning with their colleagues; and to invite feedback and suggestions; and to provide a journal that captured the story as it unfolded.

And they measured stuff before they made any changes and afterwards so they could measure the impact, and so that they could assess the evidence scientifically.

And that was actually quite easy because the CCG was already measuring what they needed to know: admissions, length of stay, cost, and outcomes.

All they needed to learn was how to present and interpret that data in a meaningful way.  And as part of their IS training,  they learned how to use system behaviour charts, or SBCs.


By Jan 2015 they had learned enough of the HCSE techniques and tools to establish the diagnosis and start to making changes to the parts of the system that they could influence.


Two years later they subjected their before-and-after data to robust statistical analysis and they had a surprise. A big one!

Reducing hospital mortality was not a stated objective of their ACE project, and they only checked the mortality data to be sure that it had not changed.

But it had, and the “p=0.014” part of the statement above means that the probability that this 20.0% reduction in hospital mortality was due to random chance … is less than 1.4%.  [This is well below the 5% threshold that we usually accept as “statistically significant” in a clinical trial.]

But …

This was not a randomised controlled trial.  This was an intervention in a complicated, ever-changing system; so they needed to check that the hospital mortality for comparable patients who were not their patients had not changed as well.

And the statistical analysis of the hospital mortality for the ‘other’ practices for the same patient group, and the same period of time confirmed that there had been no statistically significant change in their hospital mortality.

So, it appears that what the Sutton ACE Team did to reduce length of stay (and cost) had also, unintentionally, reduced hospital mortality. A lot!


And this unexpected outcome raises a whole raft of questions …


If you would like to read their full story then you can do so … here.

It is a story of hunger for improvement, of humility to learn, of hard work and of hope for the future.

This is a snapshot of an experiment in progress.  The question being asked is “Can consultant surgeons be trained to be system flow designers in one day?”

On the left are Kate Silvester and Phil Debenham … their doctor/trainers. On the right are some brave volunteer consultant surgeons.

It is a tense moment. The focused concentration is palpable. It is a tough design assignment … a chronically chaotic one-stop outpatient clinic. They know it well.


They have the raw, unprocessed, data and they are deep into diagnosis mode.  On the other side of the room is another team of consultant surgeon volunteers who are struggling with the same challenge. Competition is in the air. Reputations are on the line. The game is on.

They are racing to generate this … a process template chart … that illustrates the conversion of raw event data into something visible and meaningful. A Gantt chart.

Their tools are basic – coloured pens and squared paper – just as Henry L. Gantt used in 1916 – a hundred years ago.

Hidden in this Gantt chart is the diagnosis, the open door to the path to improving this clinic design.  It is as plain as the nose on your face … if you know what to look for. They don’t. Well, … not yet.


Skip forwards to later in the experiment. Both teams have solved the ‘impossible’ problem. They have diagnosed the system design flaw that was causing the queues, chaos and waiting … and they have designed and verified a solution. With no more than squared paper and coloured pens.  Henry G would be delighted.

And they are justifiably proud of their achievement because, when they tested their design in the real world, it showed that the queues and chaos had “evaporated”.  And it cost … nothing.


At the start of the experiment they were unaware of what was possible. At the end of the experiment they knew how to do it. In one day.

The question: ‘”Can consultant surgeons be trained to be system flow designers in one day?”

The answer: “Yes”


For more posts like this please vote here.
For more information please subscribe here.

About a year ago we looked back at the previous 10 years of NHS unscheduled care performance …

click here to read

… and warned that a catastrophe was on the way because we had created a urgent care “pressure cooker”.

Did waving the red warning flag make any difference?

It seems not.

The catastrophe happened just as predicted … A&E performance slumped to an all-time low, and has not recovered.


A pressure cooker is an elegantly simple system – a strong metal box with a sealed lid and a pressure-sensitive valve.  Food cooks more quickly at a higher temperature, and we can increase the boiling point of water by increasing the ambient pressure, so all we need to do is put some water in the cooker, close the lid, set the pressure limit we want (i.e. the temperature we want) and apply some heat.  Simple.  As the water boils the steam increases the pressure inside, until the regulator valve opens and lets a bit of steam out. The more heat we apply – the faster the steam comes out – but the internal pressure and temperature remain constant. An elegant self-regulating system.


Our unscheduled care acute hospital pressure cooker design is very similar – but it has an additional feature – we can squeeze raw patients in through a one-way valve labelled “admissions” and the internal pressure will squeeze them out through another one-way pressure-sensitive valve called “discharges”.

But there is not much head-space inside our hospital (i.e. empty beds) so pushing patients in will increase the pressure inside, and it will trigger an internal reaction called “fire-fighting” that generates heat (but sadly no insight).  When the internal pressure reaches the critical level, patients are squeezed out; ready-or-not.

What emerges from the chaotic cauldron is a mixture of under-cooked, just-right, and over-cooked patients.  And we then conduct quality control audits and we label what we find as “quality variation”, but it looks random so it gives us no clues as to what to do next.

Equilibrium is eventually achieved – what goes in comes out – the pressure and temperature auto-regulate – the chaos becomes chronic – and the quality of the output is predictably unpredictable, with some of it badly but randomly spoiled (i.e. harmed).

And our auto-regulating pressure cooker is very resistant to external influences, which after all is one of its key design features.


Squeezing a bit less in (i.e. admissions avoidance) does not make any difference to the internal pressure and temperature.  It auto-regulates.  The reduced flow means longer cooking time and we just get less under-cooked and more over-cooked output.  Oh, and we go bust because our revenue has reduced but our costs have not.

Building a bigger pressure cooker (i.e. adding more beds) does not make any sustained difference either.  Again the system auto-regulates.  The extra space allows a longer cooking time – and again we get less under-cooked and more over-cooked output.  Oh, and we still go bust (same revenue but increased cost).

Turning down the heat (i.e. reducing the 4 hr A&E lead time target yield from 98% to 95%) does not make any difference. Our elegant auto-regulating design adjusts itself to sustain the internal pressure and temperature.  Output is still variable, but least we do not go bust.


This metaphor may go some way to explain why the intuitively obvious “initiatives” to improve unscheduled care performance have had no significant or sustained impact.

And what is more worrying is that they may even have made the situation worse.

Working inside an urgent care pressure cooker is dangerous.  People get emotionally damaged and scarred.


The good news is that a different approach is available … a health and social care systems engineering (HSCSE) approach … one that we could use to change the fundamental design from fire-fighter to flow-facilitator.

Using HSCSE theory, techniques and tools we could specify, design, build, verify, implement and validate a low-pressure, low-resistance, low-wait, low-latency, high-efficiency unscheduled care flow design that is safe, timely, effective and affordable.

An emergency care “Dyson” so to speak.

But we are not training our people how to do that.

Why is that?


For more posts like this please vote here.
For more information please subscribe here.
To email the author please click here.

businessman_cloud_periscope_18347The path from chaos to calm is not clearly marked.  If it were we would not have chaotic health care processes, anxious patients, frustrated staff and escalating costs.

Many believe that there is no way out of the chaos. They have given up trying.

Some still nurture the hope that there is a way and are looking for a path through the fog of confusion.

A few know that there is a way out because they have been shown a path from chaos to calm and can show others how to find it.

Someone, a long time ago, explored the fog and discovered clarity of understanding on the far side, and returned with a Map of the Mind-field.


Q: What is causing The Fog?

When hot rhetoric meets cold reality the fog of disillusionment forms.

Q: Where does the hot rhetoric come from?

Passionate, well-intended and ill-informed people in positions of influence, authority and power. The orators, debaters and commentators.

They do not appear to have an ability to diagnose and to design, so cannot generate effective decisions and coordinate efficient delivery of solutions.

They have not learned how and seem to be unaware of it.

If they had, then they would be able to show that there is a path from chaos to calm.

A safe, quick, surprisingly enjoyable and productive path.

If they had the know-how then they could pull from the front in the ‘right’ direction, rather than push from the back in the ‘wrong’ one.


And the people who are spreading this good news are those who have just emerged from the path.  Their own fog of confusion evaporating as they discovered the clarity of hindsight for themselves.

Ah ha!  Now I see! Wow!  The view from the far side of The Fog is amazing and exciting. The opportunity and potential is … unlimited.  I must share the news. I must tell everyone! I must show them how-to.

Here is a story from Chris Jones who has recently emerged from The Fog.

And here is a description of part of the Mind-field Map, narrated in 2008 by Kate Silvester, a doctor and manufacturing systems engineer.

stick_figure_help_button_150_wht_9911Imagine this scenario:

You develop some non-specific symptoms.

You see your GP who refers you urgently to a 2 week clinic.

You are seen, assessed, investigated and informed that … you have cancer!


The shock, denial, anger, blame, bargaining, depression, acceptance sequence kicks off … it is sometimes called the Kübler-Ross grief reaction … and it is a normal part of the human psyche.

But there is better news. You also learn that your condition is probably treatable, but that it will require chemotherapy, and that there are no guarantees of success.

You know that time is of the essence … the cancer is growing.

And time has a new relevance for you … it is called life time … and you know that you may not have as much left as you had hoped.  Every hour is precious.


So now imagine your reaction when you attend your local chemotherapy day unit (CDU) for your first dose of chemotherapy and have to wait four hours for the toxic but potentially life-saving drugs.

They are very expensive and they have a short shelf-life so the NHS cannot afford to waste any.   The Aseptic Unit team wait until all the safety checks are OK before they proceed to prepare your chemotherapy.  That all takes time, about four hours.

Once the team get to know you it will go quicker. Hopefully.

It doesn’t.

The delays are not the result of unfamiliarity … they are the result of the design of the process.

All your fellow patients seem to suffer repeated waiting too, and you learn that they have been doing so for a long time.  That seems to be the way it is.  The waiting room is well used.

Everyone seems resigned to the belief that this is the best it can be.

They are not happy about it but they feel powerless to do anything.


Then one day someone demonstrates that it is not the best it can be.

It can be better.  A lot better!

And they demonstrate that this better way can be designed.

And they demonstrate that they can learn how to design this better way.

And they demonstrate what happens when they apply their new learning …

… by doing it and by sharing their story of “what-we-did-and-how-we-did-it“.

CDU_Waiting_Room

If life time is so precious, why waste it?

And perhaps the most surprising outcome was that their safer, quicker, calmer design was also 20% more productive.

frailsafeSafe means avoiding harm, and safety is an emergent property of a well-designed system.

Frail means infirm, poorly, wobbly and at higher risk of harm.

So we want our health care system to be a FrailSafe Design.

But is it? How would we know? And what could we do to improve it?


About ten years ago I was involved in a project to improve the safety design of a specific clinical stream flowing through the hospital that I work in.

The ‘at risk’ group of patients were frail elderly patients admitted as an emergency after a fall and who had suffered a fractured thigh bone. The neck of the femur.

Historically, the outcome for these patients was poor.  Many do not survive, and many of the survivors never returned to independent living. They become even more frail.


The project was undertaken during an organisational transition, the hospital was being ‘taken over’ by a bigger one.  This created a window of opportunity for some disruptive innovation, and the project was labelled as a ‘Lean’ one because we had been inspired by similar work done at Bolton some years before and Lean was the flavour of the month.

The actual change was small: it was a flow design tweak that cost nothing to implement.

First we asked two flow questions:
Q1: How many of these high-risk frail patients do we admit a year?
A1: About one per day on average.
Q2: What is the safety critical time for these patients?
A2: The first four days.  The sooner they have hip surgery and are able to be actively mobilise the better their outcome.

Second we applied Little’s Law which showed the average number of patients in this critical phase is four. This was the ‘work in progress’ or WIP.

And we knew that variation is always present, and we knew that having all these patients in one place would make it much easier for the multi-disciplinary teams to provide timely care and to avoid potentially harmful delays.

So we suggested that one six-bedded bay on one of the trauma wards be designated the Fractured Neck Of Femur bay.

That was the flow diagnosis and design done.

The safety design was created by the multi-disciplinary teams who looked after these patients: the geriatricians, the anaesthetists, the perioperative emergency care team (PECT), the trauma and orthopaedic team, the physiotherapists, and so on.

They designed checklists to ensure that all #NOF patients got what they needed when they needed it and so that nothing important was left to chance.

And that was basically it.

And the impact was remarkable. The stream flowed. And one measured outcome was a dramatic and highly statistically significant reduction in mortality.

Injury_2011_Results
The full paper was published in Injury 2011; 42: 1234-1237.

We had created a FrailSafe Design … which implied that what was happening before was clearly not safe for these frail patients!


And there was an improved outcome for the patients who survived: A far larger proportion rehabilitated and returned to independent living, and a far smaller proportion required long-term institutional care.

By learning how to create and implement a FrailSafe Design we had added both years-to-life and life-to-years.

It cost nothing to achieve and the message was clear, as this quote is from the 2011 paper illustrates …

Injury_2011_Message

What was a bit disappointing was the gap of four years between delivering this dramatic and highly significant patient safety and quality improvement and the sharing of the story.


What is more exciting is that the concept of FrailSafe is growing, evolving and spreading.

figure_pointing_out_chart_data_150_clr_8005It was the time for Bob and Leslie’s regular Improvement Science coaching session.

<Leslie> Hi Bob, how are you today?

<Bob> I am getting over a winter cold but otherwise I am good.  And you?

<Leslie> I am OK and I need to talk something through with you because I suspect you will be able to help.

<Bob> OK. What is the context?

<Leslie> Well, one of the projects that I am involved with is looking at the elderly unplanned admission stream which accounts for less than half of our unplanned admissions but more than half of our bed days.

<Bob> OK. So what were you looking to improve?

<Leslie> We want to reduce the average length of stay so that we free up beds to provide resilient space-capacity to ease the 4-hour A&E admission delay niggle.

<Bob> That sounds like a very reasonable strategy.  So have you made any changes and measured any improvements?

<Leslie> We worked through the 6M Design® sequence. We studied the current system, diagnosed some time traps and bottlenecks, redesigned the ones we could influence, modified the system, and continued to measure to monitor the effect.

<Bob> And?

<Leslie> It feels better but the system behaviour charts do not show an improvement.

<Bob> Which charts, specifically?

<Leslie> The BaseLine XmR charts of average length of stay for each week of activity.

<Bob> And you locked the limits when you made the changes?

<Leslie> Yes. And there still were no red flags. So that means our changes have not had a significant effect. But it definitely feels better. Am I deluding myself?

<Bob> I do not believe so. Your subjective assessment is very likely to be accurate. Our Chimp OS 1.0 is very good at some things! I think the issue is with the tool you are using to measure the change.

<Leslie> The XmR chart?  But I thought that was THE tool to use?

<Bob> Like all tools it is designed for a specific purpose.  Are you familiar with the term Type II Error.

<Leslie> Doesn’t that come from research? I seem to remember that is the error we make when we have an under-powered study.  When our sample size is too small to confidently detect the change in the mean that we are looking for.

<Bob> A perfect definition!  The same error can happen when we are doing before and after studies too.  And when it does, we see the pattern you have just described: the process feels better but we do not see any red flags on our BaseLine© chart.

<Leslie> But if our changes only have a small effect how can it feel better?

<Bob> Because some changes have cumulative effects and we omit to measure them.

<Leslie> OMG!  That makes complete sense!  For example, if my bank balance is stable my average income and average expenses are balanced over time. So if I make a small-but-sustained improvement to my expenses, like using lower cost generic label products, then I will see a cumulative benefit over time to the balance, but not the monthly expenses; because the noise swamps the signal on that chart!

<Bob> An excellent analogy!

<Leslie> So the XmR chart is not the tool for this job. And if this is the only tool we have then we risk making a Type II error. Is that correct?

<Bob> Yes. We do still use an XmR chart first though, because if there is a big enough and fast enough shift then the XmR chart will reveal it.  If there is not then we do not give up just yet; we reach for our more sensitive shift detector tool.

<Leslie> Which is?

<Bob> I will leave you to ponder on that question.  You are a trained designer now so it is time to put your designer hat on and first consider the purpose of this new tool, and then create the outline a fit-for-purpose design.

<Leslie> OK, I am on the case!

Hypothesis: Chaotic behaviour of healthcare systems is inevitable without more resources.

This appears to be a rather widely held belief, but what is the evidence?

Can we disprove this hypothesis?

Chaos is a predictable, emergent behaviour of many systems, both natural and man made, a discovery that was made rather recently, in the 1970’s.  Chaotic behaviour is not the same as random behaviour.  The fundamental difference is that random implies independence, while chaos requires the opposite: chaotic systems have interdependent parts.

Chaotic behaviour is complex and counter-intuitive, which may explain why it took so long for the penny to drop.


Chaos is a complex behaviour and it is tempting to assume that complicated structures always lead to complex behaviour.  But they do not.  A mechanical clock is a complicated structure but its behaviour is intentionally very stable and highly predictable – that is the purpose of a clock.  It is a fit-for-purpose design.

The healthcare system has many parts; it too is a complicated system; it has a complicated structure.  It is often seen to demonstrate chaotic behaviour.

So we might propose that a complicated system like healthcare could also be stable and predictable. If it were designed to be.


But there is another critical factor to take into account.

A mechanical clock only has inanimate cogs and springs that only obey the Laws of Physics – and they are neither adaptable nor negotiable.

A healthcare system is different. It is a living structure. It has patients, providers and purchasers as essential components. And the rules of how people work together are both negotiable and adaptable.

So when we are thinking about a healthcare system we are thinking about a complex adaptive system or CAS.

And that changes everything!


The good news is that adaptive behaviour can be a very effective anti-chaos strategy, if it is applied wisely.  The not-so-good news is that if it is not applied wisely then it can actually generate even more chaos.


Which brings us back to our hypothesis.

What if the chaos we are observing on out healthcare system is actually iatrogenic?

What if we are unintentionally and unconsciously generating it?

These questions require an answer because if we are unwittingly contributing to the chaos, with insight, understanding and wisdom we can intentionally calm it too.

These questions also challenge us to study our current way of thinking and working.  And in that challenge we will need to demonstrate a behaviour called humility. An ability to acknowledge that there are gaps in our knowledge and our understanding. A willingness to learn.


This all sounds rather too plausible in theory. What about an example?

Let us consider the highest flow process in healthcare: the outpatient clinic stream.

The typical design is a three-step process called the New-Test-Review design. This sequential design is simpler because the steps are largely independent of each other. And this simplicity is attractive because it is easier to schedule so is less likely to be chaotic. The downsides are the queues and delays between the steps and the risk of getting lost in the system. So if we are worried that a patient may have a serious illness that requires prompt diagnosis and treatment (e.g. cancer), then this simpler design is actually a potentially unsafe design.

A one-stop clinic is a better design because the New-Test-Review steps are completed in one visit, and that is better for everyone. But, a one-stop clinic is a more challenging scheduling problem because all the steps are now interdependent, and that is fertile soil for chaos to emerge.  And chaos is exactly what we often see.

Attending a chaotic one-stop clinic is frustrating experience for both patients and staff, and it is also less productive use of resources. So the chaos and cost appears to be price we are asked to pay for a quicker and safer design.

So is the one stop clinic chaos inevitable, or is it avoidable?

Simple observation of a one stop clinic shows that the chaos is associated with queues – which are visible as a waiting room full of patients and front-of-house staff working very hard to manage the queue and to signpost and soothe the disgruntled patients.

What if the one stop clinic queue and chaos is iatrogenic? What if it was avoidable without investing in more resources? Would the chaos evaporate? Would the quality improve?  Could we have a safer, calmer, higher quality and more productive design?

Last week I shared evidence that proved the one-stop clinic chaos was iatrogenic – by showing it was avoidable.

A team of healthcare staff were shown how to diagnose the cause of the queue and were then able to remove that cause, and to deliver the same outcome without the queue and the associated chaos.

And the most surprising lesson that the team learned was that they achieved this improvement using the same resources as before; and that those resources also felt the benefit of the chaos evaporating. Their work was easier, calmer and more predictable.

The impossible-without-more-resources hypothesis had been disproved.

So, where else in our complicated and complex healthcare system might we apply anti-chaos?

Everywhere?


And for more about complexity science see Santa Fe Institute

stick_figure_magic_carpet_150_wht_5040It was the appointed time for Bob and Leslie’s regular coaching session as part of the improvement science practitioner programme.

<Leslie> Hi Bob, I am feeling rather despondent today so please excuse me in advance if you hear a lot of “Yes, but …” language.

<Bob> I am sorry to hear that Leslie. Do you want to talk about it?

<Leslie> Yes, please.  The trigger for my gloom was being sent on a mandatory training workshop.

<Bob> OK. Training to do what?

<Leslie> Outpatient demand and capacity planning!

<Bob> But you know how to do that already, so what is the reason you were “sent”?

<Leslie> Well, I am no longer sure I know how to it.  That is why I am feeling so blue.  I went more out of curiosity and I came away utterly confused and with my confidence shattered.

<Bob> Oh dear! We had better start at the beginning.  What was the purpose of the workshop?

<Leslie> To train everyone in how to use an Outpatient Demand and Capacity planning model, an Excel one that we were told to download along with the User Guide.  I think it is part of a national push to improve waiting times for outpatients.

<Bob> OK. On the surface that sounds reasonable. You have designed and built your own Excel flow-models already; so where did the trouble start?

<Leslie> I will attempt to explain.  This was a paragraph in the instructions. I felt OK with this because my Improvement Science training has given me a very good understanding of basic demand and capacity theory.

IST_DandC_Model_01<Bob> OK.  I am guessing that other delegates may have felt less comfortable with this. Was that the case?

<Leslie> The training workshops are targeted at Operational Managers and the ones I spoke to actually felt that they had a good grasp of the basics.

<Bob> OK. That is encouraging, but a warning bell is ringing for me. So where did the trouble start?

<Leslie> Well, before going to the workshop I decided to read the User Guide so that I had some idea of how this magic tool worked.  This is where I started to wobble – this paragraph specifically …

IST_DandC_Model_02

<Bob> H’mm. What did you make of that?

<Leslie> It was complete gibberish to me and I felt like an idiot for not understanding it.  I went to the workshop in a bit of a panic and hoped that all would become clear. It didn’t.

<Bob> Did the User Guide explain what ‘percentile’ means in this context, ideally with some visual charts to assist?

<Leslie> No and the use of ‘th’ and ‘%’ was really confusing too.  After that I sort of went into a mental fog and none of the workshop made much sense.  It was all about practising using the tool without any understanding of how it worked. Like a black magic box.


<Bob> OK.  I can see why you were confused, and do not worry, you are not an idiot.  It looks like the author of the User Guide has unwittingly used some very confusing and ambiguous terminology here.  So can you talk me through what you have to do to use this magic box?

<Leslie> First we have to enter some of our historical data; the number of new referrals per week for a year; and the referral and appointment dates for all patients for the most recent three months.

<Bob> OK. That sounds very reasonable.  A run chart of historical demand and the raw event data for a Vitals Chart® is where I would start the measurement phase too – so long as the data creates a valid 3 month reporting window.

<Leslie> Yes, I though so too … but that is not how the black box model seems to work. The weekly demand is used to draw an SPC chart, but the event data seems to disappear into the innards of the black box, and recommendations pop out of it.

<Bob> Ah ha!  And let me guess the relationship between the term ‘percentile’ and the SPC chart of weekly new demand was not explained?

<Leslie> Spot on.  What does percentile mean?


<Bob> It is statistics jargon. Remember that we have talked about the distribution of the data around the average on a BaseLine chart; and how we use the histogram feature of BaseLine to show it visually.  Like this example.

IST_DandC_Model_03<Leslie> Yes. I recognise that. This chart shows a stable system of demand with an average of around 150 new referrals per week and the variation distributed above and below the average in a symmetrical pattern, falling off to zero around the upper and lower process limits.  I believe that you said that over 99% will fall within the limits.

<Bob> Good.  The blue histogram on this chart is called a probability distribution function, to use the terminology of a statistician.

<Leslie> OK.

<Bob> So, what would happen if we created a Pareto chart of demand using the number of patients per week as the categories and ignoring the time aspect? We are allowed to do that if the behaviour is stable, as this chart suggests.

<Leslie> Give me a minute, I will need to do a rough sketch. Does this look right?

IST_DandC_Model_04

<Bob> Perfect!  So if you now convert the Y-axis to a percentage scale so that 52 weeks is 100% then where does the average weekly demand of about 150 fall? Read up from the X-axis to the line then across to the Y-axis.

<Leslie> At about 26 weeks or 50% of 52 weeks.  Ah ha!  So that is what a percentile means!  The 50th percentile is the average, the zeroth percentile is around the lower process limit and the 100th percentile is around the upper process limit!

<Bob> In this case the 50th percentile is the average, it is not always the case though.  So where is the 85th percentile line?

<Leslie> Um, 52 times 0.85 is 44.2 which, reading across from the Y-axis then down to the X-axis gives a weekly demand of about 170 per week.  That is about the same as the average plus one sigma according to the run chart.

<Bob> Excellent. The Pareto chart that you have drawn is called a cumulative probability distribution function … and that is usually what percentiles refer to. Comparative Statisticians love these but often omit to explain their rationale to non-statisticians!


<Leslie> Phew!  So, now I can see that the 65th percentile is just above average demand, and 85th percentile is above that.  But in the confusing paragraph how does that relate to the phrase “65% and 85% of the time”?

<Bob> It doesn’t. That is the really, really confusing part of  that paragraph. I am not surprised that you looped out at that point!

<Leslie> OK. Let us leave that for another conversation.  If I ignore that bit then does the rest of it make sense?

<Bob> Not yet alas. We need to dig a bit deeper. What would you say are the implications of this message?


<Leslie> Well.  I know that if our flow-capacity is less than our average demand then we will guarantee to create an unstable queue and chaos. That is the Flaw of Averages trap.

<Bob> OK.  The creator of this tool seems to know that.

<Leslie> And my outpatient manager colleagues are always complaining that they do not have enough slots to book into, so I conclude that our current flow-capacity is just above the 50th percentile.

<Bob> A reasonable hypothesis.

<Leslie> So to calm the chaos the message is saying I will need to increase my flow capacity up to the 85th percentile of demand which is from about 150 slots per week to 170 slots per week. An increase of 7% which implies a 7% increase in costs.

<Bob> Good.  I am pleased that you did not fall into the intuitive trap that a increase from the 50th to the 85th percentile implies a 35/50 or 70% increase! Your estimate of 7% is a reasonable one.

<Leslie> Well it may be theoretically reasonable but it is not practically possible. We are exhorted to reduce costs by at least that amount.

<Bob> So we have a finance versus governance bun-fight with the operational managers caught in the middle: FOG. That is not the end of the litany of woes … is there anything about Did Not Attends in the model?


<Leslie> Yes indeed! We are required to enter the percentage of DNAs and what we do with them. Do we discharge them or re-book them.

<Bob> OK. Pragmatic reality is always much more interesting than academic rhetoric and this aspect of the real system rather complicates things, at least for a comparative statistician. This is where the smoke and mirrors will appear and they will be hidden inside the black magic box.  To solve this conundrum we need to understand the relationship between demand, capacity, variation and yield … and it is rather counter-intuitive.  So, how would you approach this problem?

<Leslie> I would use the 6M Design® framework and I would start with a map and not with a model; least of all a magic black box one that I did not design, build and verify myself.

<Bob> And how do you know that will work any better?

<Leslie> Because at the One Day ISP Workshop I saw it work with my own eyes. The queues, waits and chaos just evaporated.  And it cost nothing.  We already had more than enough “capacity”.

<Bob> Indeed you did.  So shall we do this one as an ISP-2 project?

<Leslie> An excellent suggestion.  I already feel my confidence flowing back and I am looking forward to this new challenge. Thank you again Bob.

CAS_DiagramThe theme this week has been emergent learning.

By that I mean the ‘ah ha’ moment that happens when lots of bits of a conceptual jigsaw go ‘click’ and fall into place.

When, what initially appears to be smoky confusion suddenly snaps into sharp clarity.  Eureka!  And now new learning can emerge.


This did not happen by accident.  It was engineered.


The picture above is part of a bigger schematic map of a system – in this case a system related to the global health challenge of escalating obesity.

It is a complicated arrangement of boxes and arrows. There are  dotted lines that outline parts of the system that have leaky boundaries like the borders on a political map.

But it is a static picture of the structure … it tells us almost nothing about the function, the system behaviour.

And our intuition tells us that, because it is a complicated structure, it will exhibit complex and difficult to understand behaviour.  So, guided by our inner voice, we toss it into the pile labelled Wicked Problems and look for something easier to work on.


Our natural assumption that a complicated structure always leads to complex behavior is an invalid simplification, and one that we can disprove in a matter of moments.


Exhibit 1. A system can be complicated and yet still exhibit simple, stable and predictable behavior.

Harrison_H1The picture is of a clock designed and built by John Harrison (1693-1776).  It is called H1 and it is a sea clock.

Masters of sailing ships required very accurate clocks to calculate their longitude, the East-West coordinate on the Earth’s surface. And in the 18th Century this was a BIG problem. Too many ships were getting lost at sea.

Harrison’s sea clock is complicated.  It has many moving parts, but it was the most stable and accurate clock of its time.  And his later ones were smaller, more accurate and even more complicated.


Exhibit 2.  A system can be simple yet still exhibit complex, unstable and unpredictable behavior.

Double-compound-pendulumThe image is of a pendulum made of only two rods joined by a hinge.  The structure is simple yet the behavior is complex, and this can only be appreciated with a dynamic visualisation.

The behaviour is clearly not random. It has structure. It is called chaotic.

So, with these two real examples we have disproved our assumption that a complicated structure always leads to complex behaviour; and we have also disproved its inverse … that complex behavior always comes from a complicated structure.


This deeper insight gives us hope.

We can design complicated systems to exhibit stable and predictable behaviour if, like John Harrison, we know how to.

But John Harrison was a rare, naturally-gifted, mechanical genius, and even with that advantage it took him decades to learn how to design and to build his sea clocks.  He was the first to do so and he was self-educated so his learning was emergent.

And to make it easier, he was working on a purely mechanical system comprised of non-living parts that only obeyed the Laws of Newtonian physics.


Our healthcare system is not quite like that.  The parts are living people whose actions are limited by physical Laws but whose decisions are steered by other policies … learned ones … and ones that can change.  They are called heuristics and they can vary from person-to-person and minute-to-minute.  Heuristics can be learned, unlearned, updated, and evolved.

This is called emergent learning.

And to generate it we only need to ‘engineer’ the context for it … the rest happens as if by magic … but only if we do the engineering well.


This week I personally observed over a dozen healthcare staff simultaneously re-invent a complicated process scheduling technique, at the same time as using it to eliminate the  queues, waiting and chaos in the system they wanted to improve.

Their queues just evaporated … without requiring any extra capacity or money. Eureka!


We did not show them how to do it so they could not have just copied what we did.

We designed and built the context for their learning to emerge … and it did.  On its own.

The ISP One Day Intensive Workshop delivered emergent learning … just as it was designed to do.

This engineering is called complex adaptive system design and this one example proves that CASD is both possible, learnable and therefore teachable.

figure_turning_a_custom_page_15415

Telling a compelling story of improvement is an essential skill for a facilitator and leader of change.

A compelling story has two essential components: cultural and technical. Otherwise known as emotional and factual.

Many of the stories that we hear are one or the other; and consequently are much less effective.


Some prefer emotive language and use stories of dismay and distress to generate an angry reaction: “That is awful we must DO something about that!”

And while emotion is the necessary fuel for action,  an angry mob usually attacks the assumed cause rather than the actual cause and can become ‘mindless’ and destructive.

Those who have observed the dangers of the angry mob opt for a more reflective, evidence-based, scientific, rational, analytical, careful, risk-avoidance approach.

And while facts are the necessary informers of decision, the analytical mind often gets stuck in the ‘paralysis of analysis’ swamp as layer upon layer of increasing complexity is exposed … more questions than answers.


So in a compelling story we need a bit of both.

We need a story that fires our emotions … and … we need a story that engages our intellect.

A bit of something for everyone.

And the key to developing this compelling-story-telling skill this is to start with something small enough to be doable in a reasonable period of time.  A short story rather than a lengthy legend.

A story, tale or fable.

Aesop’s Fables and Chaucer’s Canterbury Tales are still remembered for their timeless stories.


And here is a taste of such a story … one that has been published recently for all to read and to enjoy.

A Story of Learning Improvement Science

It is an effective blend of cultural and technical, emotional and factual … and to read the full story just follow the ‘Continue’ link.

smack_head_in_disappointment_150_wht_16653One of the traps for the inexperienced Improvement Science Practitioner is to believe that applying the science in the real world is as easy as it is in the safety of the training environment.

It isn’t.

The real world is messier and more complicated and it is easy to get lost in the fog of confusion and chaos.


So how do we avoid losing our footing, slipping into the toxic emotional swamp of organisational culture and giving ourselves an unpleasant dunking!

We use safety equipment … to protect ourselves and others from unintended harm.

The Improvement-by-Design framework is like a scaffold.  It is there to provide structure and safety.  The techniques and tools are like the harnesses, shackles, ropes, crampons, and pitons.  They give us flexibility and security.

But we need to know how to use them. We need to be competent as well as confident.

We do not want to tie ourselves up in knots … and we do not want to discover that we have not tied ourselves to something strong enough to support us if we slip. Which we will.


So we need to learn an practice the basics skills to the point that they are second nature.

We need to learn how to tie secure knots, quickly and reliably.

We need to learn how to plan an ascent … identifying the potential hazards and designing around them.

We need to learn how to assemble and check what we will need before we start … not too much and not too little.

We need to learn how to monitor out progress against our planned milestones and be ready to change the plan as we go …and even to abandon the attempt if necessary.


We would not try to climb a real mountain without the necessary training, planning, equipment and support … even though it might look easy.

And we do not try to climb an improvement mountain without the necessary training, planning, tools and support … even though it might look easy.

It is not as easy as it looks.

Dr_Bob_ThumbnailThere is a big bun-fight kicking off on the topic of 7-day working in the NHS.

The evidence is that there is a statistical association between mortality in hospital of emergency admissions and day of the week: and weekends are more dangerous.

There are fewer staff working at weekends in hospitals than during the week … and delays and avoidable errors increase … so risk of harm increases.

The evidence also shows that significantly fewer patients are discharged at weekends.


So the ‘obvious’ solution is to have more staff on duty at weekends … which will cost more money.


Simple, obvious, linear and wrong.  Our intuition has tricked us … again!


Let us unravel this Gordian Knot with a bit of flow science and a thought experiment.

1. The evidence shows that there are fewer discharges at weekends … and so demonstrates lack of discharge flow-capacity. A discharge process is not a single step, there are many things that must flow in sync for a discharge to happen … and if any one of them is missing or delayed then the discharge does not happen or is delayed.  The weakest link effect.

2. The evidence shows that the number of unplanned admissions varies rather less across the week; which makes sense because they are unplanned.

3. So add those two together and at weekends we see hospitals filling up with unplanned admissions – not because the sick ones are arriving faster – but because the well ones are leaving slower.

4. The effect of this is that at weekends the queue of people in beds gets bigger … and they need looking after … which requires people and time and money.

5. So the number of staffed beds in a hospital must be enough to hold the biggest queue – not the average or some fudged version of the average like a 95th percentile.

6. So a hospital running a 5-day model needs more beds because there will be more variation in bed use and we do not want to run out of beds and delay the admission of the newest and sickest patients. The ones at most risk.

7. People do not get sicker because there is better availability of healthcare services – but saying we need to add more unplanned care flow capacity at weekends implies that it does.  What is actually required is that the same amount of flow-resource that is currently available Mon-Fri is spread out Mon-Sun. The flow-capacity is designed to match the customer demand – not the convenience of the supplier.  And that means for all parts of the system required for unplanned patients to flow.  What, where and when. It costs the same.

8. Then what happens is that the variation in the maximum size of the queue of patients in the hospital will fall and empty beds will appear – as if by magic.  Empty beds that ensure there is always one for a new, sick, unplanned admission on any day of the week.

9. And empty beds that are never used … do not need to be staffed … so there is a quick way to reduce expensive agency staff costs.

So with a comprehensive 7-day flow-capacity model the system actually gets safer, less chaotic, higher quality and less expensive. All at the same time. Safety-Flow-Quality-Productivity.

Dr_Bob_ThumbnailA recurring theme this week has been the concept of ‘quality’.

And it became quickly apparent that a clear definition of quality is often elusive.

Which seems to have led to a belief that quality is difficult to measure because it is subjective and has no precise definition.

The science of quality improvement is nearly 100 years old … and it was shown a long time ago, in 1924 in fact, that it is rather easy to measure quality – objectively and scientifically.

The objective measure of quality is called “yield”.

To measure yield we simply ask all our customers this question:

Did your experience meet your expectation?” 

If the answer is ‘Yes’ then we count this as OK; if it is ‘No’ then we count it as Not OK.

Yield is the ratio of the OKs divided by the number of customers who answered.


But this tried-and-tested way of measuring quality has a design flaw:

Where does a customer get their expectation from?

Because if a customer has an unrealistically high expectation then whatever we do will be perceived by them as Not OK.

So to consistently deliver a high quality service (i.e. high yield) we need to be able to influence both the customer experience and the customer expectation.


If we set our sights on a worthwhile and realistic expectation and we broadcast that to our customers, then we also need a way of avoiding their disappointment … that our objective quality outcome audit may reveal.

One way to defuse disappointment is to set a low enough expectation … which is, sadly, the approach adopted by naysayers,  complainers, cynics and doom-mongers. The inept.

That is not the path to either improvement or to excellence. It is the path to apathy.

A better approach is to set ourselves some internal standards of expectation and to check at each step if our work meets our own standard … and if it fails then we know we need have some more work to do.

This commonly used approach to maintaining quality is called a check-and-correct design.

So let us explore the ramifications of this check-and-correct approach to quality.


Suppose the quality of the product or service that we deliver is influenced by many apparently random factors. And when we actually measure our yield we discover that the chance of getting a right-first-time outcome is about 50%.  This amounts to little more than a quality lottery and we could simulate that ‘random’ process by tossing a coin.

So to set a realistic expectation for future customers there are two further questions we need to answer:
1. How long can an typical customer expect to wait for our product or service?
2. How much can an typical customer expect to pay for our product or service?

It is not immediately and intuitively obvious what the answers to these questions are … so we need to perform an experiment to find out.

Suppose we have five customers who require our product or service … we could represent them as Post It Notes; and suppose we have a clock … we could measure how long the process is taking; and suppose we have our coin … we can simulate the yield of the step; … and suppose we do not start the lead time clock until we start the work for each customer.

We now have the necessary and sufficient components to assemble a simple simulation model of our system … a model that will give us realistic answers to our questions.

So let us see what happens … just click the ‘Start Game’ button.


It is worth running this exercise about a dozen times and recording the data for each run … then plotting the results on a time-series chart.

The data to plot is the make-time (which is the time displayed on the top left) and the cost (which is display top middle).

The make-time is the time from starting the first game to completing the last task.

The cost is the number of coin tosses we needed to do to deliver all work to the required standard.

And here are the charts from my dozen runs (yours will be different).

PostItNote_MakeTimeChart

PostItNote_CostChart

The variation from run to run is obvious; as is the correlation between a make-time and a high cost.

The charts also answer our two questions … a make time up to 90 would not be exceptional and an average cost of 10 implies that is the minimum price we need to charge in order to stay in business.

Our customers are waiting while we check-and-correct our own errors and we are expecting them to pay for the extra work!

In the NHS we have a name for this low-quality high-cost design: Payment By Results.


The charts also show us what is possible … a make time of 20 and a cost of 5.

That happened when, purely by chance, we tossed five heads in a row in the Quality Lottery.

So with this insight we could consider how we might increase the probability of ‘throwing a head’ i.e. doing the work right-first-time … because we can see from our charts what would happen.

The improved quality and cost of changing ourselves and our system to remove the root causes of our errors.

Quality Improvement-by-Design.

That something worth learning how to do.

And can we honestly justify not doing it?

It was the time for Bob and Leslie’s regular coaching session. Dr_Bob_ThumbnailBob was already on line when Leslie dialed in to the teleconference.

<Leslie> Hi Bob, sorry I am a bit late.

<Bob> No problem Leslie. What aspect of improvement science shall we explore today?

<Leslie> Well, I’ve been working through the Safety-Flow-Quality-Productivity cycle in my project and everything is going really well.  The team are really starting to put the bits of the jigsaw together and can see how the synergy works.

<Bob> Excellent. And I assume they can see the sources of antagonism too.

<Leslie> Yes, indeed! I am now up to the point of considering productivity and I know it was introduced at the end of the Foundation course but only very briefly.

<Bob> Yes,  productivity was described as a system metric. A ratio of a steam metric and a stage metric … what we get out of the streams divided by what we put into the stages.  That is a very generic definition.

<Leslie> Yes, and that I think is my problem. It is too generic and I get it confused with concepts like efficiency.  Are they the same thing?

<Bob> A very good question and the short answer is “No”, but we need to explore that in more depth.  Many people confuse efficiency and productivity and I believe that is because we learn the meaning of words from the context that we see them used in. If  others use the words imprecisely then it generates discussion, antagonism and confusion and we are left with the impression of that it is a ‘difficult’ subject.  The reality is that it is not difficult when we use the words in a valid way.

<Leslie> OK. That reassures me a bit … so what is the definition of efficiency?

<Bob> Efficiency is a stream metric – it is the ratio of the minimum cost of the resources required to complete one task divided by the actual cost of the resources used to complete one task.

<Leslie> Um.  OK … so how does time come into that?

<Bob> Cost is a generic concept … it can refer to time, money and lots of other things.  If we stick to time and money then we know that if we have to employ ‘people’ then time will cost money because people need money to buy essential stuff that the need for survival. Water, food, clothes, shelter and so on.

<Leslie> So we could use efficiency in terms of resource-time required to complete a task?

<Bob> Yes. That is a very useful way of looking at it.

<Leslie> So how is productivity different? Completed tasks out divided by cash in to pay for resource time would be a productivity metric. It looks the same.

<Bob> Does it?  The definition of efficiency is possible cost divided by actual cost. It is not the as our definition of system productivity.

<Leslie> Ah yes, I see. So do others define productivity the same way?

<Bob> Try looking it up on Wikipedia …

<Leslie> OK … here we go …

Productivity is an average measure of the efficiency of production. It can be expressed as the ratio of output to inputs used in the production process, i.e. output per unit of input”.

Now that is really confusing!  It looks like efficiency and productivity are the same. Let me see what the Wikipedia definition of efficiency is …

“Efficiency is the (often measurable) ability to avoid wasting materials, energy, efforts, money, and time in doing something or in producing a desired result”.

But that is closer to your definition of efficiency – the actual cost is the minimum cost plus the cost of waste.

<Bob> Yes.  I think you are starting to see where the confusion arises.  And this is because there is a critical piece of the jigsaw missing.

<Leslie> Oh …. and what is that?

<Bob> Worth.

<Leslie> Eh?

<Bob> Efficiency has nothing to do with whether the output of the stream has any worth.  I can produce a worthless product with low waste … in other words very efficiently.  And what if we have the situation where the output of my process is actually harmful.  The more efficiently I use my resources the more harm I will cause from a fixed amount of resource … and in that situation it is actually safer to have a very inefficient process!

<Leslie> Wow!  That really hits the nail on the head … and the implications are … profound.  Efficiency is onbective and relates only to flow … and between flow and productivity we have to cross the Safety-Quality line. Productivity also includes the subjective concept of worth or value. That all makes complete sense now. A productive system is a subjectively and objectively win-win-win design.

<Bob> Yup.  Get the safety. flow and quality perspectives of the design in synergy and productivity will sky-rocket. It is called a Fit-4-Purpose design.

knee_jerk_reflexA commonly used technique for continuous improvement is the Plan-Do-Study-Act or PDSA cycle.

This is a derivative of the PDCA cycle first described by Walter Shewhart in the 1930’s … where C is Check.

The problem with PDSA is that improvement does not start with a plan, it starts with some form of study … so SAPD would be a better order.


IHI_MFITo illustrate this point if we look at the IHI Model for Improvement … the first step is a pair of questions related to purpose “What are we trying to accomplish?” and “How will we know a change is an improvement?

With these questions we are stepping back and studying our shared perspective of our desired future.

It is a conscious and deliberate act.

We are examining our mental models … studying them … and comparing them.  We have not reached a diagnosis or a decision yet, so we cannot plan or do yet.

The third question is a combination of diagnosis and design … we need to understand our current state in order to design changes that will take up to our improved future state.

We cannot plan what to do or how to do it until we have decided and agreed what the future design will look like, and tested that our proposed future design is fit-4-purpose.


So improvement by discovery or by design does not start with plan, it starts with study.


And another word for study is ‘sense’ which may be a better one … because study implies a deliberate, conscious, often slow process … while sense is not so restrictive.

Very often our actions are not the result of a deliberative process … they are automatic and reflex. We do not think about them. They just sort of happen.

The image of the knee-jerk reflex illustrates the point.

In fact we have little conscious control over these automatic motor reflexes which respond much more quickly than our conscious thinking process can.  We are aware of the knee jerk after it has happened, not before, so we may be fooled into thinking that we ‘Do’ without a ‘Plan’.  But when we look in more detail we can see the sensory input and the hard-wired ‘plan’ that links to to motor output.  Study-Plan-Do.


The same is true for many other actions – our unconscious mind senses, processes, decides, plans and acts long before we are consciously aware … and often the only clue we have is a brief flash of emotion … and usually not even that.  Our behaviour is largely habitual.


And even in situations when we need to make choices the sense-recognise-act process is fast … such as when a patient suddenly becomes very ill … we switch into the Resuscitate mode which is a pre-planned sequence of steps that is guided by what are sensing … but it is not made up on the spot. There is no committee. No meetings. We just do what we have learned and practiced how to do … because it was designed to.   It still starts with Study … it is just that the Study phase is very short … we just need enough information to trigger the pre-prepared plan. ABC – Airway … Breathing … Circulation. No discussion. No debate.


So, improvement starts with Study … and depending on what we sense what happens next will vary … and it will involve some form of decision and plan.

1. Unconscious, hard-wired, knee jerk reflex.
2. Unconscious, learned, habitual behaviour.
3. Conscious, pre-planned, steered response.
4. Conscious, deliberation-diagnosis-design then delivery.

The difference is just the context and the timing.   They are all Study-Plan-Do.

 And the Plan may be to Do Nothing …. the Deliberate Act of Omission.


And when we go-and-see and study the external reality we sometimes get a surprise … what we see is not what we expect. We feel a sense of confusion. And before we can plan we need to adjust our mental model so that it better matches reality. We need to establish clarity.  And in this situation we are doing Study-Adjust-Plan-Do …. S(A)PD.

business_race__PA_150_wht_3222When we start the process of learning to apply the Science of Improvement in practice we need to start within our circle of influence.

It is just easier, quicker and safer to begin there – and to build our capability, experience and confidence in steps.

And when we get the inevitable ‘amazing’ result it is natural and reasonable for us to want to share the good news with others.  We crossed the finish line first and we want to celebrate.   And that is exactly what we need to do.


We just need to be careful how we do it.

We need to be careful not to unintentionally broadcast an “I am Great (and You are Not)” message – because if we do that we will make further change even more difficult.


Competition can be healthy or unhealthy  … just as scepticism can be.

We want to foster healthy competition … and to do that we have to do something that can feel counter-intuitive … we have to listen to our competitors; and we have to learn from them; and we have to share our discoveries with them.

Eh?


Just picture these two scenarios in your mind’s eye:

Scenario One: The competition is a war. There can only be one winner … the strongest, most daring, most cunning, most ruthless, most feared competitor. So secrecy and ingenuity are needed. Information must be hoarded. Untruths and confusion must be spread.

Scenario Two: The competition is a race. There can only be one winner … the strongest, most resilient, hardest working, fastest learning, most innovative, most admired competitor.  So openness and humility are needed. Information must be shared. Truths and clarity must be spread.

Compare the likely outcomes of the two scenarios.

Which one sounds the more productive, more rewarding and more enjoyable?


So the challenge for the champions of improvement is to appreciate and to practice a different version of the “I’m Great … ” mantra …

I’m Great (And So Are You).

top_surgeon_400_wht_7589All healthcare organisations strive for excellence, which is good, and most achieve mediocrity, which is not so good.

Why is that?

One cause is the design of their model for improvement … the one that is driven by targets, complaints, near misses, serious untoward incidents (SUIs) and never events (which are not never).

A model for improvement that is driven by failure feedback loops can only ever achieve mediocrity, not excellence.

Whaaaaaat?!* That’s rubbish”  I hear you cry … so let us see.


Try this simple test …. just ask any employee in your organisation this question (and start with yourself):

How do you know you are doing a good job?

If the first answer heard is “When no one is complaining” then you have a Mediocrity Design.


When customers have a disappointing experience most do not pen a letter or write an email to complain.  Most just sigh and lower their expectations to avoid future disappointment; many will grumble to family and friends; and only a few (about 5%) will actually complain. They are the really angry extreme.  So they can easily be fobbed off with platitudes … just being earnestly listened to and unreservedly apologised to is usually enough to take the wind out of their sails.  It will escort them back to the silent but disappointed majority whose expectation is being gradually eroded by relentless disappointment. Nothing fundamental needs to change because eventually the complaints dry up, apathy is re-established and chronic mediocrity is assured.


To achieve excellence we need a better answer to the “How do you know you are doing a good job?” question.

We need to be able to say “I know I am doing a good job because this is what a good outcome looks like; this is my essential contribution to achieving that outcome; and here are the measures of the intended outcomes that we are achieving.

In short we need a clear purpose, a defined part in the process that delivers that purpose, and we need an objective feedback loop that tells us that the purpose has been achieved and that our work is worthwhile.

And if  any of those components are missing then we know we have some improvement work to do.

The first step is usually answering the question “What is our purpose?

The second step is using the purpose to design and install the how-are-we-doing feedback loop.

And the  third step is to learn to use the success feedback loop to ensure that we are always working to have a necessary-and-sufficient process that delivers the intended outcome and that we are playing a part in that.

And when we are reliably achieving our purpose, we set ourselves an even better outcome – an even safer, calmer, higher quality and more productive one … and doing that will generate more improvement work to do.  We will not be idle.


That is the essence of Excellence-by-Design.

figure_pointing_out_chart_data_150_wht_8005It was the appointed time for the ISP coaching session and both Bob and Leslie were logged on and chatting about their Easter breaks.

<Bob> OK Leslie, I suppose we had better do some actual work, which seems a shame on such a wonderful spring day.

<Leslie> Yes, I suppose so. There is actually something I would like to ask you about because I came across it by accident and it looked very pertinent to flow design … but you have never mentioned it.

<Bob> That sounds interesting. What is it?

<Leslie> V.U.T.

<Bob> Ah ha!  You have stumbled across the Queue Theorists and the Factory Physicists.  So, what was your take on it?

<Leslie> Well it all sounded very impressive. The context is I was having a chat with a colleague who is also getting into the improvement stuff and who had been to a course called “Factory Physics for Managers” – and he came away buzzing about the VUT equation … and claimed that it explained everything!

<Bob> OK. So what did you do next?

<Leslie> I looked it up of course and I have to say the more I read the more confused I got. Maybe I am just a bid dim and not up to understanding this stuff.

<Bob> Well you are certainly not dim so your confusion must be caused by something else. Did your colleague describe how the VUT equation is applied in practice?

<Leslie> Um. No, I do not remember him describing an example – just that it explained why we cannot expect to run resources at 100% utilisation.

<Bob> Well he is correct on that point … though there is a bit more to it than that.  A more accurate statement is “We cannot expect our system to be stable if there is variation and we run flow-resources at 100% utilisation”.

<Leslie> Well that sounds just like the sort of thing we have been talking about, what you call “resilient design”, so what is the problem with the VUT equation?

<Bob> The problem is that it gives an estimate of the average waiting time in a very simple system called a G/G/1 system.

<Leslie> Eh? What is a G/G/1 system?

<Bob> Arrgh … this is the can of queue theory worms that I was hoping to avoid … but as you brought it up let us grasp the nettle.  This is called Kendall’s Notation and it is a short cut notation for describing the system design. The first letter refers to the arrivals or demand and G means a general distribution of arrival times; the second G refers to the size of the jobs or the cycle time and again the distribution is general; and the last number refers to the number of parallel resources pulling from the queue.

<Leslie> OK, so that is a single queue feeding into a single resource … the simplest possible flow system.

<Bob> Yes. But that isn’t the problem.  The problem is that the VUT equation gives an approximation to the average waiting time. It tells us nothing about the variation in the waiting time.

<Leslie> Ah I see. So it tells us nothing about the variation in the size of the queue either … so does not help us plan the required space-capacity to hold the varying queue.

<Bob> Precisely.  There is another problem too.  The ‘U’ term in the VUT equation refers to utilisation of the resource … denoted by the symbol ρ or rho.  The actual term is ρ / (1-ρ) … so what happens when rho approaches one … or in practical terms the average utilisation of the resource approaches 100%?

<Leslie> Um … 1 divided by (1-1) is 1 divided by zero which is … infinity!  The average waiting time becomes infinitely long!

<Bob> Yes, but only if we wait forever – in reality we cannot and anyway – reality is always changing … we live in a dynamic, ever-changing, unstable system called Reality. The VUT equation may be academically appealing but in practice it is almost useless.

<Leslie> Ah ha! Now I see why you never mentioned it. So how do we design for resilience in practice? How do we get a handle on the behaviour of even the G/G/1 system over time?

<Bob> We use an Excel spreadsheet to simulate our G/G/1 system and we find a fit-for-purpose design using an empirical, experimental approach. It is actually quite straightforward and does not require any Queue Theory or VUT equations … just a bit of basic Excel know-how.

<Leslie> Phew!  That sounds more up my street. I would like to see an example.

<Bob> Welcome to the first exercise in ISP-2 (Flow).

Dr_Bob_Thumbnail[Bing] Bob logged in for the weekly Webex coaching session. Leslie was not yet on line, but joined a few minutes later.

<Leslie> Hi Bob, sorry I am a bit late, I have been grappling with a data analysis problem and did not notice the time.

<Bob> Hi Leslie. Sounds interesting. Would you like to talk about that?

<Leslie> Yes please! It has been driving me nuts!

<Bob> OK. Some context first please.

<Leslie> Right, yes. The context is an improvement-by-design assignment with a primary care team who are looking at ways to reduce the unplanned admissions for elderly patients by 10%.

<Bob> OK. Why 10%?

<Leslie> Because they said that would be an operationally very significant reduction.  Most of their unplanned admissions, and therefore costs for admissions, are in that age group.  They feel that some admissions are avoidable with better primary care support and a 10% reduction would make their investment of time and effort worthwhile.

<Bob> OK. That makes complete sense. Setting a new design specification is OK.  I assume they have some baseline flow data.

<Leslie> Yes. We have historical weekly unplanned admissions data for two years. It looks stable, though rather variable on a week-by-week basis.

<Bob> So has the design change been made?

<Leslie> Yes, over three months ago – so I expected to be able to see something by now but there are no red flags on the XmR chart of weekly admissions. No change.  They are adamant that they are making a difference, particularly in reducing re-admissions.  I do not want to disappoint them by saying that all their hard work has made no difference!

<Bob> OK Leslie. Let us approach this rationally.  What are the possible causes that the weekly admissions chart is not signalling a change?

<Leslie> If there has not been a change in admissions. This could be because they have indeed reduced readmissions but new admissions have gone up and is masking the effect.

<Bob> Yes. That is possible. Any other ideas?

<Leslie> That their intervention has made no difference to re-admissions and their data is erroneous … or worse still … fabricated!

<Bob> Yes. That is possible too. Any other ideas?

<Leslie> Um. No. I cannot think of any.

<Bob> What about the idea that the XmR chart is not showing a change that is actually there?

<Leslie> You mean a false negative? That the sensitivity of the XmR chart is limited? How can that be? I thought these charts will always signal a significant shift.

<Bob> It depends on the degree of shift and the amount of variation. The more variation there is the harder it is to detect a small shift.  In a conventional statistical test we would just use bigger samples, but that does not work with an XmR chart because the run tests are all fixed length. Pre-defined sample sizes.

<Leslie> So that means we can miss small but significant changes and come to the wrong conclusion that our change has had no effect! Isn’t that called a Type 2 error?

<Bob> Yes, it is. And we need to be aware of the limitations of the analysis tool we are using. So, now you know that how might you get around the problem?

<Leslie> One way would be to aggregate the data over a longer time period before plotting on the chart … we know that will reduce the sample variation.

<Bob> Yes. That would work … but what is the downside?

<Leslie> That we have to wait a lot longer to show a change, or not. We do not want that.

<Bob> I agree. So what we do is we use a chart that is much more sensitive to small shifts of the mean.  And that is called a cusum chart. These were not invented until 30 years after Shewhart first described his time-series chart.  To give you an example, do you recall that the work-in-progress chart is much more sensitive to changes in flow than either demand or activity charts?

<Leslie> Yes, and the WIP chart also reacts immediately if either demand or activity change. It is the one I always look at first.

<Bob> That is because a WIP chart is actually a cusum chart. It is the cumulative sum of the difference between demand and activity.

<Leslie> OK! That makes sense. So how do I create and use a cusum chart?

<Bob> I have just emailed you some instructions and a few examples. You can try with your unplanned admissions data. It should only take a few minutes. I will get a cup of tea and a chocolate Hobnob while I wait.

[Five minutes later]

<Leslie> Wow! That is just brilliant!  I can see clearly on the cusum chart when the shifts happened and when I split the XmR chart at those points the underlying changes become clear and measurable. The team did indeed achieve a 10% reduction in admissions just as they claimed they had.  And I checked with a statistical test which confirmed that it is statistically significant.

<Bob> Good work.  Cusum charts take a bit of getting used to and we have be careful about the metric we are plotting and a few other things but it is a useful trick to have up our sleeves for situations like this.

<Leslie> Thanks Bob. I will bear that in mind.  Now I just need to work out how to explain cusum charts to others! I do not want to be accused of using statistical smoke-and-mirrors! I think a golf metaphor may work with the GPs.

everyone_has_an_idea_300_wht_12709[Bing Bong] Bob was already logged into the weekly coaching Webex when Leslie arrived: a little late.

<Bob> Hi Leslie, how has your week been?

<Leslie> Hi Bob, sorry I am a bit late. It has been a very interesting week.

<Bob> My curiosity is pricked … are you willing to share?

<Leslie> Yes indeed! First an update on the improvement project was talked about a few weeks ago.

<Bob> The call centre one?

<Leslie> Yes.  The good news is that the improvement has been sustained. It was not a flash in the pan. The chaos is gone and the calm has continued.

<Bob> That is very good to hear. And how did the team react?

<Leslie> That is one of the interesting things. They went really quiet.  There was no celebration, no cheering, no sounds of champagne corks popping.  It was almost as if they did not believe what they were seeing and they feared that if they celebrated too early they would somehow trigger a failure … or wake up from a dream.

<Bob> That is a very common reaction.  It takes a while for reality to sink in – the reality that they have changed something, that the world did not end, and that their chronic chaos has evaporated.  It is like a grief reaction … they have to mourn the loss of their disbelief. That takes time. About six weeks usually.

<Leslie> Yes, that is exactly what has happened – and I know they have now got over the surprise because the message I got this week was simply “OK, that appears to have worked exactly as you predicted it would. Will you help us solve the next impossible problem?

<Bob> Well done Leslie!  You have helped them break through the “Impossibility Barrier”.  So what was your answer?

<Leslie> Well I was really tempted to say “Of course, let me at it!” but I did not. Instead I asked a question “What specifically do you need my help to do?

<Bob> OK.  And how was that reply received?

<Leslie> They were surprised, and they said “But we could not have done this on our own. You know what to do right at the start and even with your help it took us months to get to the point where we were ready to make the change. So you can do this stuff much more quickly than we can.

<Bob> Well they are factually correct.

<Leslie> Yes I know, so I pointed out that although the technical part of the design does not take very long … that was not the problem … what slowed us down was the cultural part of the change.  And that is done now so does not need to be repeated. The next study-plan-do cycle will be much quicker and they only need me for the technical bits they have not seen before.

<Bob> Excellent. So how would you now describe your role?

<Leslie> More of a facilitator and coach with a bit of only-when-needed training thrown in.

<Bob> Exactly … and I have a label for this role … I call it a Catalyst.

<Leslie> That is interesting, why so?

<Bob> Because the definition of a catalyst fits rather well. Using the usual scientific definition, a catalyst increases the yield and rate of a chemical reaction. With a catalyst, reactions occur faster and with less energy and catalysts are not consumed, they are recycled, so only tiny amounts are required.

<Leslie> Ah yes, that feels about right.  But I am not just catalysing the reaction that produced the desired result am I?

<Bob> No. What else are you doing?

<Leslie> I am also converting some of the substrate into potential future catalysts too.

<Bob> Yes, you are. And that is what is needed for the current paradigm to shift.

<Leslie> Wow! I see that. This is powerful stuff!

<Bob> It is indeed. And the reaction you are catalysing is the combination of wisdom with ineptitude.

<Leslie> Eh? Can you repeat that again. Wisdom and ineptitude? Those are not words that I hear very often. I hear words like dumb, stupid, ignorant, incompetent and incapable. What is the reason you use those words?

<Bob> Simply because the dictionary definitions fit. Ineptitude means not knowing what to do to get the result we want, which is not the same as just not knowing stuff or not having the necessary skills.  What we need are decisions which lead to effective actions and to intended outcomes. Wise decisions. If we demonstrate ineptitude we reveal that we lack the wisdom to make those effective decisions.  So we need to combine ineptitude with wisdom to get the capability to achieve our purpose.

<Leslie> But why use the word “wisdom”? Why not just “knowledge”?

<Bob> Because knowledge is not enough.  Knowledge just implies that I recognise what I am seeing. “I know this. I have seen it before“.  Appreciating the implication of what I recognise is something more … it is called “understanding”.

<Leslie> Ah! I know this. I have seen this before. I know what a time-series chart is and I know how to create one but it takes guidance, time and practice to understand the implications of what the chart is saying about the system.  But where does wisdom fit?

<Bob>Understanding is past-focussed. We understand how we got to where we are in the present. We cannot change the past so understanding has nothing to do with wise decisions or effective actions or intended outcomes. It is retrospection.

<Leslie> So wisdom is future-focussed. It is prospective. It is the ability to predict the outcome of an action and that ability is necessary to make wise decisions. That is why wisdom is the antidote to ineptitude!

<Bob> Well put! And that is what you did long before you made the change in the call centre … you learned how to make reliable predictions … and the results have confirmed yours was a wise decision.  They got their intended outcome. You are not inept.

<Leslie> Ah! Now I understand the difference. I am a catalyst for improvement because I am able to diagnose and treat ineptitude. That is what you did for me. You are a catalyst.

<Bob> Welcome to the world of the Improvement Science Practitioner.  You have earned your place.


Atul_GawandeThe word “ineptitude” is coined by Dr Atul Gawande in the first of the 2014 Reith Lectures entitled “Why Do Doctors Fail?“.

Click HERE to listen to his first lecture (30 minutes).

In his second lecture he describes how it is the design of the system that delivers apparently miraculous outcomes.  It is the way that the parts work together and the attention to context and to detail that counts.

Click HERE to hear his second lecture  “The Century of the System” (30 minutes).

And Atul has a proven track record in system improvement … he is the doctor-surgeon-instigator of the WHO Safer Surgery Check List – a simple idea borrowed from aviation that is now used worldwide and is preventing 1000’s of easily avoidable deaths during and after surgery.

Click HERE to hear his third lecture  “The Problem of Hubris” (30 minutes).

Click HERE to hear his fourth lecture  “The Idea of Wellbeing” (30 minutes).


Flow_Science_Works[Beep] It was time again for the weekly Webex coaching session. Bob dialled into the teleconference to find Leslie already there … and very excited.

<Leslie> Hi Bob, I am so excited. I cannot wait to tell you about what has happened this week.

<Bob> Hi Leslie. You really do sound excited. I cannot wait to hear.

<Leslie> Well, let us go back a bit in the story.  You remember that I was really struggling to convince the teams I am working with to actually make changes.  I kept getting the ‘Yes … but‘ reaction from the sceptics.  It was as if they were more comfortable with complaining.

<Bob> That is the normal situation. We are all very able to delude ourselves that what we have is all we can expect.

<Leslie> Well, I listened to what you said and I asked them to work through what they predicted could happen if they did nothing.  Their healthy scepticism then worked to build their conviction that doing nothing was a very dangerous choice.

<Bob> OK. And I am guessing that insight was not enough.

<Leslie> Correct.  So then I shared some examples of what others had achieved and how they had done it, and I started to see some curiosity building, but no engagement still.  So I kept going, sharing stories of ‘what’, and ‘how’.  And eventually I got an email saying “We have thought about what you said about a one day experiment and we are prepared to give that a try“.

<Bob> Excellent. How long ago was that?

<Leslie> Three months. And I confess that I was part of the delay.  I was so surprised that they said ‘OK‘ that I was not ready to follow on.

<Bob> OK. It sounds like you did not really believe it was possible either. So what did you do next?

<Leslie> Well I knew for sure that we would only get one chance.  If the experiment failed then it would be Game Over. So I needed to know before the change what the effect would be.  I needed to be able to predict it accurately. I also needed to feel reassured enough to take the leap of faith.

<Bob> Very good, so did you use some of your ISP-2 skills?

<Leslie> Yes! And it was a bit of a struggle because doing it in theory is one thing; doing it in reality is a lot messier.

<Bob> So what did you focus on?

<Leslie> The top niggle of course!  At St Elsewhere® we have a call-centre that provides out-of-office-hours telephone advice and guidance – and it is especially busy at weekends.  We are required to answer all calls quickly, which we do, and then we categorise them into ‘urgent’  and ‘non-urgent’ and pass them on to the specialists.  They call the clients back and provide expert advice and guidance for their specific problem.

<Bob>So you do not use standard scripts?

<Leslie> No, that does not work. The variety of the problems we have to solve is too wide. And the specialist has to come to a decision quite quickly … solve the problem over the phone, arrange a visit to an out of hours clinic, or to dispatch a mobile specialist to the client immediately.

<Bob> OK. So what was the top niggle?

<Leslie> We have contractual performance specifications we have to meet for the maximum waiting time for our specialists to call clients back; and we were not meeting them.  That implied that we were at risk of losing the contract and that meant loss of revenue and jobs.

<Bob> So doing nothing was not an option.

<Leslie> Correct. And asking for more resources was not either … the contract was a fixed price one. We got it because we offered the lowest price. If we employed more staff we would go out of business.  It was a rock-and-a-hard-place problem.

<Bob> OK.  So if this was ranked as your top niggle then you must have had a solution in mind.

<Leslie> I had a diagnosis.  The Vitals Chart© showed that we already had enough resources to do the work. The performance failure was caused by a scheduling policy – one that we created – our intuitively-obvious policy.

<Bob> Ah ha! So you suggested doing something that felt counter-intuitive.

<Leslie> Yes. And that generated all the ‘Yes .. but‘  discussion.

<Bob> OK. Do you have the Vitals Chart© to hand? Can you send me the Wait-Time run chart?

<Leslie> Yes, I expected you would ask for that … here it is.

StE_CallCentre_Before<Bob> OK. So I am looking at the run chart of waiting time for the call backs for one Saturday, and it is in call arrival order, and the blue line is the maximum allowed waiting time is that correct?

<Leslie>Yup. Can you see the diagnosis?

<Bob> Yes. This chart shows the classic pattern of ‘prioritycarveoutosis’.  The upper border is the ‘non-urgents’ and the lower group are the ‘urgents’ … the queue jumpers.

<Leslie> Spot on.  It is the rising tide of non-urgent calls that spill over the specification limit.  And when I shared this chart the immediate reaction was ‘Well that proves we need more capacity!

<Bob> And the WIP chart did not support that assertion.

<Leslie> Correct. It showed we had enough total flow-capacity already.

<Bob> So you suggested a change in the scheduling policy would solve the problem without costing any money.

<Leslie> Yes. And the reaction to that was ‘That is impossible. We are already working flat out. We need more capacity because to work quicker will mean cutting corners and it is unsafe to cut-corners‘.

<Bob> So how did you get around that invalid but widely held belief?

<Leslie> I used one of the FISH techniques. I got a few of them to play a table top game where we simulated a much simpler process and demonstrated the same waiting time pattern on a hand-drawn run chart.

<Bob> Excellent.  Did that get you to the ‘OK, we will give it a go for one day‘ decision.

<Leslie>Yes. But then I had to come up with a new design and I had test it so I know it would work.

<Bob> Because that was a step too far for them. And It sounds like you achieved that.

<Leslie> Yes.  It was tough though because I knew I had to prove to myself I could do it. If I had asked you I know what you would have said – ‘I know you can do this‘.  And last Saturday we ran the ‘experiment’. I was pacing up and down like an expectant parent!

<Bob> I expect rather like the ESA team who have just landed Rosetta’s little probe-child on an asteroid travelling at 38,000 miles per hour, billions of miles from Earth after a 10 year journey through deep space!  Totally inspiring stuff!

<Leslie> Yes. And that is why I am so excited because OUR DESIGN WORKED!  Exactly as predicted.

<Bob> Three cheers for you!  You have experienced that wonderful feeling when you see the effect of improvement-by-design with your own eyes. When that happens then you really believe what opportunities become possible.

<Leslie> So I want to show you the ‘after’ chart …

StE_CallCentre_After

<Bob> Wow!  That is a spectacular result! The activity looks very similar, and other than a ‘blip’ between 15:00 and 19:00 the prioritycarveoutosis has gone. The spikes have assignable causes I assume?

<Leslie> Spot on again!  The activity was actually well above average for a Saturday.  The subjective feedback was that the new design felt calm and under-control. The chaos had evaporated.  The performance was easily achieved and everyone was very positive about the whole experience.  The sceptics were generous enough to say it had gone better than they expected.  And yes, I am now working through the ‘spikes’ and excluding them … but only once I have a root cause that explains them.

<Bob> Well done Leslie! I sense that you now believe what is possible whereas before you just hoped it would be.

<Leslie> Yes! And the most important thing to me is that we did it ourselves. Which means improvement-by-design can be learned. It is not obvious, it feels counter-intuitive, so it is not easy … but it works.

<Bob> Yes. That is the most important message. And you have now earned your ISP Certificate of Competency.

figure_breaking_through_wall_anim_150_wht_15036The dictionary definition of resilience is “something that is capable of  returning to its original shape after being stretched, bent or otherwise deformed“.

The term is applied to inanimate objects, to people and to systems.

A rubber ball is resilient … it is that physical property that gives it bounce.

A person is described as resilient if they are able to cope with stress without being psychologically deformed in the process.  Emotional resilience is regarded as an asset.

Systems are described as resilient when they are able to cope with variation without failing. And this use of the term is associated with another concept: strength.

Strong things can withstand a lot of force before they break. Strength is not the same as resilience.

Engineers use another term – strain – which means the amount of deformation that happens when a force is applied.

Stress is the force applied, strain is the deformation that results.

So someone who is strong and resilient will not buckle under high pressure and will absorb variation – like the suspension of you car.

But is strength-and-resilience always an asset?


Suppose some strong and resilient people finds themselves in a relentlessly changing context … one in which they actually need to adapt and evolve to survive in the long term.

How well does their highly valued strength-and-resilience asset serve them?  Not very well.

They will resist the change – they are resilient – and they will resist it for a long time – they are strong.

But the change is relentless and eventually the limit of their strength will be reached … and they snap!

And when that happens all the stored energy is suddenly released. So they do not just snap – they explode!

Just like the wall in the animation above.

The final straw that triggers the sudden failure may appear insignificant … and at any other time  it would be.

But when the pressure is really on and the system is at the limit then it can be just enough to trigger the catastrophic failure from which there is no return.


Social systems behave in exactly the same way.

Those that have demonstrated durability are both strong and resilient – but in a relentlessly changing context even they will fail eventually, and when they do the collapse is sudden and catastrophic.

Structural engineers know that catastrophic failure usually starts at a localised failure and spreads rapidly through the hyper-stressed structure; each part failing in sequence as it becomes exposed and exceeds its limit of strength.  That is how the strong and resilient Twin Towers failed and fell on Sept 11th 2001. They were not knocked over. They were weakened to the point of catastrophic failure.

When systems are exposed to variable strains then these localised micro-fractures only occur at the peaks of stress and may not have time to spread very far. The damage is done though. The system is a bit weaker than it was before. And catastrophic failure is more likely in the future.

That is what caused the sudden loss of some of the first jet airliners which inexplicably just fell out of the sky on otherwise uneventful flights.  It took a long time for the root cause to be uncovered … the square windows.

Jet airliners fly at high altitude because it allows higher speeds and requires less fuel an so allows long distance flight over wide oceans, steppes, deserts and icecaps. But the air pressure is low at high altitude and passengers could not tolerate that: so the air pressure inside an airliner at high altitude is much higher than outside. It is a huge flying pressurised metal cannister.  And as it goes up and down the thin metal skin is exposed to high variations in stress which a metal tube can actually handle rather well … until we punch holes in it to fit windows to allow our passengers a nice view of the clouds outside.  We are used to square windows in our houses (because they are easier to make) so the aircraft engineers naturally put square windows in the early airliners.  And that is where the problem arose … the corners of the windows concentrate the stress and over time, with enough take-offs and landings,  the metal skin at the corners of the windows accumulate invisible micro-fractures. The metal actually fatigues. Then one day – pop – a single rivet at the corner of a square window fails and triggers the catastrophic failure of the whole structure. But the aircraft designers did not understand that.

The solution? A more resilient design – use round-cornered windows. It was that simple!


So what is the equivalent resilient design for social system? Adaptability.

But how it is possible for a system to be strong, resilient and adaptable?

The trick is to install “emotional strain gauges” or that indicate when and where the internal cultural stress is being concentrated and where the emotional strain shows first.

These niggleometers will alert us to where the stresses and strains are being felt strongest and most often – rather like pain detectors. We use the patterns of information from our network of niggleometers to help us focus our re-design attention to continuously adapt parts of our system to relieve the strain and to reduce the system wide risk of catastrophic failure.

And by installing niggleometers across our system we will move towards a design that is strong, resilient and that continuously adapts to a changing environment.

It really is that simple.

cardiogram_heart_signal_150_wht_5748[Beep] It was time for the weekly e-mentoring session so Bob switched on his laptop, logged in to the virtual meeting site and found that Lesley was already there.

<Bob> Hi Lesley. What shall we talk about today?

<Lesley> Hello Bob. Another old chestnut I am afraid. Queues.  I keep hitting the same barrier where people who are fed up with the perpetual queue chaos have only one mantra “If you want to avoid long waiting times then we need more capacity.

<Bob> So what is the problem? You know that is not the cause of chronic queues.

<Lesley> Yes, I know that mantra is incorrect – but I do not yet understand how to respectfully challenge it and how to demonstrate why it is incorrect and what the alternative is.

<Bob> OK. I understand. So could you outline a real example that we can work with.

<Lesley> Yes. Another old chestnut: the Emergency Department 4-hour breaches.

<Bob> Do you remember the Myth of Sisyphus?

<Leslie> No, I do not remember that being mentioned in the FISH course.

<Bob> Ho ho! No indeed,  it is much older. In Greek mythology Sisyphus was a king of Ephyra who was punished by the Gods for chronic deceitfulness by being compelled to roll an immense boulder up a hill, only to watch it roll back down, and then to repeat this action forever.

Sisyphus_Cartoon

<Lesley> Ah! I see the link. Yes, that is exactly how people in the ED feel.  Everyday it feels like they are pushing a heavy boulder uphill – only to have to repeat the same labour the next day. And they do not believe it can ever be any better with the resources they have.

<Bob> A rather depressing conclusion! Perhaps a better metaphor is the story in the film  “Ground Hog Day” where Bill Murray plays the part of a rather arrogant newsreader who enters a recurring nightmare where the same day is repeated, over and over. He seems powerless to prevent it.  He does eventually escape when he learns the power of humility and learns how to behave differently.

<Lesley> So the message is that there is a way out of this daily torture – if we are humble enough to learn the ‘how’.

<Bob> Well put. So shall we start?

<Lesley> Yes please!

<Bob> OK. As you know very well it is important not to use the unqualified term ‘capacity’.  We must always state if we are referring to flow-capacity or space-capacity.

<Lesley> Because they have different units and because they are intimately related to lead time by Little’s Law.

<Bob> Yes.  Little’s Law is mathematically proven Law of flow physics – it is not negotiable.

<Lesley> OK. I know that but how does it solve problem we started with?

<Bob> Little’s Law is necessary but it is not sufficient. Little’s Law relates to averages – and is therefore just the foundation. We now need to build the next level of understanding.

<Lesley> So you mean we need to introduce variation?

<Bob> Yes. And the tool we need for this is a particular form of time-series chart called a Vitals Chart.

<Lesley> And I am assuming that will show the relationship between flow, lead time and work in progress … over time ?

<Bob> Exactly. It is the temporal patterns on the Vitals Chart that point to the root causes of the Sisyphean Chaos. The flow design flaws.

<Lesley> Which are not lack of flow-capacity or space-capacity.

<Bob> Correct. If the chaos is chronic then there must already be enough space-capacity and flow-capacity. Little’s Law shows that, because if there were not the system would have failed completely a long time ago. The usual design flaw in a chronically chaotic system is one or more misaligned policies.  It is as if the system hardware is OK but the operating software is not.

<Lesley> So to escape from the Sisyphean Recurring ED 4-Hour Breach Nightmare we just need enough humility and enough time to learn how to diagnose and redesign some of our ED system operating software? Some of our own policies? Some of our own mantras?

<Bob> Yup.  And not very much actually. Most of the software is OK. We need to focus on the flaws.

<Lesley> So where do I start?

<Bob> You need to do the ISP-1 challenge that is called Brainteaser 104.  That is where you learn how to create a Vitals Chart.

<Lesley> OK. Now I see what I need to do and the reason:  understanding how to do that will help me explain it to others. And you are not going to just give me the answer.

<Bob> Correct. I am not going to just give you the answer. You will not fully understand unless you are able to build your own Vitals Chart generator. You will not be able to explain the how to others unless you demonstrate it to yourself first.

<Lesley> And what else do I need to do that?

<Bob> A spreadsheet and your raw start and finish event data.

<Lesley> But we have tried that before and neither I nor the database experts in our Performance Department could work out how to get the real time work in progress from the events – so we assumed we would have to do a head count or a bed count every hour which is impractical.

<Bob> It is indeed possible as you are about to discover for yourself. The fact that we do not know how to do something does not prove that it is impossible … humility means accepting our inevitable ignorance and being open to learning. Those who lack humility will continue to live the Sisyphean Nightmare of ED Ground Hog Day. The choice to escape is ours.

<Lesley> I choose to learn. Please send me BT104.

<Bob> It is on its way …

6MDesignJigsawSystems are made of interdependent parts that link together – rather like a jigsaw.

If pieces are distorted, missing, or in the wrong place then the picture is distorted and the system does not work as well as it could.

And if pieces of one jigsaw are mixed up with those of another then it is even more difficult to see any clear picture.

A system of improvement is just the same.

There are many improvement jigsaws each of which have pieces that fit well together and form a synergistic whole. Lean, Six Sigma, and Theory of Constraints are three well known ones.

Each improvement jigsaw evolved in a different context so naturally the picture that emerges is from a particular perspective: such as manufacturing.

So when the improvement context changes then the familiar jigsaws may not work as well: such as when we shift context from products to services, and from commercial to public.

A public service such as healthcare requires a modified improvement jigsaw … so how do we go about getting that?


One way is to ‘evolve’ an old jigsaw into a new context. That is tricky because it means adding new pieces and changing old pieces and the ‘zealots’ do not like changing their familiar jigsaw so they resist.

Another way is to ‘combine’ several old jigsaws in the hope that together they will provide enough perspectives. That is even more tricky because now you have several tribes of zealots who resist having their familiar jigsaws modified.

What about starting with a blank canvas and painting a new picture from scratch? Well it is actually very difficult to create a blank canvas for learning because we cannot erase what we already know. Our current mental model is the context we need for learning new knowledge.


So what about using a combination of the above?

What about first learning a new creative approach called design? And within that framework we can then create a new improvement jigsaw that better suits our specific context using some of the pieces of the existing ones. We may need to modify the pieces a bit to allow them to fit better together, and we may need to fashion new pieces to fill the gaps that we expose. But that is part of the fun.


6MDesignJigsawThe improvement jigsaw shown here is a new hybrid.

It has been created from a combination of existing improvement knowledge and some innovative stuff.

Pareto analysis was described by Vilfredo Pareto over 100 years ago.  So that is tried and tested!

Time-series charts were invented by Walter Shewhart almost 100 years ago. So they are tried and tested too!

The combination of Pareto and Shewhart tools have been used very effectively for over 50 years. The combination is well proven.

The other two pieces are innovative. They have different parents and different pedigrees. And different purposes.

The Niggle-o-Gram® is related to 2-by-2, FMEA and EIQ and the 4N Chart®.  It is the synthesis of them that creates a powerful lens for focussing our improvement efforts on where the greatest return-on-investment will be.

The Right-2-Left Map® is a descendent of the Design family and has been crossed with Graph Theory and Causal Network exemplars to introduce their best features.  Its purpose is to expose errors of omission.

The emergent system is synergistic … much more effective than each part individually … and more even than their linear sum.


So when learning this new Science of Improvement we have to focus first on learning about the individual pieces and we do that by seeing examples of them used in practice.  That in itself is illuminating!

As we learn about more pieces a fog of confusion starts to form and we run the risk of mutating into a ‘tool-head’.  We know about the pieces in detail but we still do not see the bigger picture.

To avoid the tool-head trap we must balance our learning wheel and ensure that we invest enough time in learning-by-doing.

Then one day something apparently random will happen that triggers a ‘click’.  Familiar pieces start to fit together in a unfamiliar way and as we see the relationships, the sequences, and the synergy – then a bigger picture will start to emerge. Slowly at first and then more quickly as more pieces aggregate.

Suddenly we feel a big CLICK as the final pieces fall into place.  The fog of confusion evaporates in the bright sunlight of a paradigm shift in our thinking.

The way forward that was previously obscured becomes clearly visible.

Ah ha!

And we are off on the next stage  of our purposeful journey of improvement.