Goodheart’s Law And The Quest To Quantify

45 years ago, British economist Charles Goodheart inserted what he considered a snarky aside into an otherwise serious academic publication:

Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.

Was his humor hard to recognize due to his being a British academic or due to his being an economist? Hard to say. The former cultivate a dry wit and erudite vocabulary, while the latter have earned their field of inquiry the apt nickname "the dismal science."

The implications of what came to be known as Goodheart's Law can be more easily appreciated in paraphrased form:

When a measure becomes a target, it ceases to be a good measure.

Expressed through the lens of a skeptic, when an organization decides to target a specific measure, agents of that organization will respond by finding a way to achieve the desired target, even if it potentially undermines the outcome that the metric was intended to improve.

An example in medicine was an initiative by the British government to reduce waiting times for evaluation in Accident and Emergency Departments (the ER) to under four hours. The UK mandated that all A&Es report the time lag between when an ambulance patient exited the rig until the time that patient was seen by a provider.

How did A&E's respond? They asked ambulances to keep patients inside the rig (often for extended periods of time) until the now distorted time of exit would allow them to conform with the new four hour mandate.

Spirit of the law, zero.

Letter of the law, one.

My professional experience bears out the insight contained in Goodheart's Law.

Between 2003-05, new core quality measures adopted by the Centers for Medicare and Medicaid Services (CMS) required reporting of how many adult patients in the ED had blood cultures obtained prior to the administration of antibiotics for pneumonia. Eventually, the time from presentation to administration of antibiotics began to be tracked as well.

These measures were created based on expert society guidelines (if not evidence) with the best of intentions - identify sick patients earlier and tailor their treatment more quickly. The problem is that the available evidence suggested obtaining blood cultures rarely changed our management for most pneumonia patients.

At the esteemed tertiary academic medical center where I worked, this new measure changed our workflow so that all patients with fever and respiratory symptoms had automated orders placed for a chest x-ray from triage.

The attending physician supervising residents in the lower acuity area of the ED was given the disruptive task of reviewing these chest films within minutes.

The goal was to comply with the mandate and retain our reputation for excellence via the metrics CMS planned to put forth for public scrutiny. After all, we could now more rapidly diagnosis pneumonia and order blood cultures prior to antibiotics.

Our numbers with the new pathway were stellar, the envy of our peer institutions. Our patients interpreted more and earlier x-rays prior to provider evaluation to mean they were getting superior care. Thanks to our excellent numbers, we rated highly based on CMS core measures and enhanced our strong reputation.

CMS was proud to increase transparency in an admittedly opaque health care marketplace. Consumers could now see how the hospital they sought care at compared with nearby competitors on standard quality measurements.

Insurers were likelier to prefer hospitals with higher quality rankings for their patients. Hospital administrators lauded our success as well - thanks to our high rankings they now held that much more leverage in negotiating contracts with local payors.

Since every emergency medicine group, as a hospital-based specialty, serves at the pleasure of hospital administration, our departmental leadership improved (or at least did not jeopardize) our academic group's standing with our administrators.

Did all the patients need additional x-rays? What harm was caused as a byproduct of the "therapeutic radiation" that resulted?

Was the excess cost to the system justified? Did anyone do the math to decide whether the entire effort was the best use of finite resources?

Did the residents receive less instruction and supervision by tasking their attending physician with one more distraction from teaching?

There is ample evidence that every added distraction from the task at hand increases the potential for medical error. Did non-pneumonia patients suffer due to the status interruptus that the attending physician reviewing the additional x-rays?

Was the added cost borne by the payors or shifted to the patients?

There were numerous winners (and, although I have not highlighted them, losers) of this initiative to turn a measurement into a target.

We practice in a time where every hospital is reduced to a U.S. News and World Report Ranking and every provider is considered the sum of his or her metrics.

We take measure more continual measurements in the name of quality than we ever have before, yet we fail to account for the distorting effects of our quest to quantify.

Has quality has improved as a result?

Amused or dismayed by just how accurately the description fits what is happening to your practice of medicine? Listen to the terrific podcast that introduced me to this concept: Planet Money #877: The Laws of the Office.