Category Archives: Safety

#FHIR and Alt-health

Unless you’ve been living under a rock, you’ve probably heard the terms ‘alt-right’ and ‘fake news’ by now. According to Wikipedia: “the alt-right (short for “alternative right”) is a loose group of people with far right ideologies who reject mainstream conservatism in the United States.” and “Fake news websites (also referred to online as hoax news) deliberately publish hoaxes, propaganda, and disinformation to drive web traffic inflamed by social media. Note: I’m well aware that Fake news websites aren’t confined to just the alt-right, but there’s a strong link between alt-right and fake news. It’s become clear, as time as progressed, that this is just another security risk in the eco-system that is the internet. Viruses, phishing, and now fake news. Something for Google and Facebook to work on – here’s some thoughts about that.

Waiting in the wings is something else I call ‘Alt-Health’ – the spread of bad healthcare advice running like wildfire through social media. One particular aspect of if – the anti-vaxxer campaign – that’s getting air time in the mainstream press, but it’s much broader problem than just that. Bad medical advice on the internet is nothing new – e.g. Google have a team devoted to working on the quality of web pages returned for medical related searches. But the spread of bad advice on social media is outside that. And it’s not always wrong advice, actually. Sometimes, it’s extremely good advice for one patient, but wrong for another patient. But it’s handed on as gospel – ‘this worked for me, so ignore your doctor and do what’s proven to work’… if only life were so simple. Looking forward, I expect that this is going to turn into an epidemic, as people turn away from complexity and cost, seeking simple solutions. What they’ll get is outright wrong or wrong in context, and it’s going to kill people. On the other hand, we know that while a lot of the advice is bad, it can also be very good as well. In some circumstances, better than the clinical advice a patient gets, particularly for rare diseases.

People are going to need trusted healthcare advisers to sort of good advice from bad advice. Unfortunately, there’s a challenge here: some advice will be 100% stupid and wrong (you can cure cancer by eating the right vitamins) while other advice will be 100% right and true (you should stay on the treatment advised by your clinical team, or talk to them if it’s not working out), but a lot of advice is going to fit into the category ‘or maybe true, depending on circumstances’.

That’s where FHIR enters the picture: the FHIR standard has the coverage of this space:

  • Patient’s can get their clinical data in the FHIR format – their past records, their diagnosis with supporting evidence, and their care plan
  • They can share their with their trusted adviser
  • As well, decision support services, criteria, quality measures, clinical evidence, provenance chains – all these can be represented in FHIR (including from the micro-patient communities that have the expertise)
  • We’re working with the biopharma industry to formalise trial data reporting too

Though there are many other standards at play here, FHIR is the one that links all this space together, and naturally fits into the web/internet eco-system.

So I believe that one of our key activities in the FHIR community over the next few years will be ensuring that the standards are in place to address the challenge that alt-health brings – that is, to enable clinical decision support to rank the reliability of random advice on the internet.

 

Support for Testing in the #FHIR Specification

On the last day of the baltimore meeting, a small group of us met to talk about the future of testing in the FHIR specification. In particular, the participants included both Aegis and Mitre, who develop Touchstone and Crucible respectively. Out small group made 3 decisions:

  1. We will work towards the adoption of a formal testing framework for FHIR interfaces
  2. We will simplify the existing TestScript resource
  3. We will draft and propose a TestReport resource

Taking these backwards:

The TestReport will contain a summary of the outcomes a test run – which tests were run, whether they passed, and references to a further information (logs, etc). It’s for helping to maintain current registries of implementations and outcomes. I’ll blog about this further once there’s a draft to look at.

I’ll comment on #2 in time, when we have a draft for people to consider.

For #1, a testing framework, we have several motivations for this:

  • It’s well known that a problem with testing processes like IHE is that under the right care (e.g. configuration) and with some judicious off-road modifications, systems can pass their IHE testing – but that often doesn’t equate to conformant implementations in practice
  • Many systems undergo stringent testing in-house in continuous builds, but real world integrations are extremely hard to test
  • As a consequence, applications are often poorly tested prior to upgrade
  • Every real clinical system I’ve worked with has had test patients so that staff can check how the system works. In Australia, custom is that the patients are called Donald Duck, Mickey Mouse, etc. Staff just have to magically ‘know’ that these are test patients, and they pollute stats etc a little
  • One near exception is the Australian MyHR, for which there was no test system, and no possibility of test patients. In practice, people trying to use it had to use their own personal records as test patients (many told me that they did this, and bits of those records drifted into the media)

We’d like to normalise testing – to make clear expectations about how production systems could be tested, and to start encouraging production systems to actually provide such support. Two key corollaries of this – administrators can download the official tests (e.g. IHE, or others) and run them against their system with robots playing the part of the the other system. And also, the testing framework starts to acquire a clinical safety role – adminstrators can add new tests to their own automated tests for the application (e.g. prior to upgrade or configuration change) and assure themselves that known system requirements with regard to system safety are assured. Note that this is something you can only do in the context of a fully configured and linked system of systems – which is why support for testing in production systems is so important in this regard.

Anywhere, he’s a set of ideas to get people thinking about how this all might work:

  • Resources on the FHIR API are either ‘Production’ or ‘Test’. There will be a ‘test’ tag that labels all the test resources accordingly. No tag = not testing
  • Resources labelled ‘test’ are allowed to refer to production records (e.g. for configuration) , but never the other way around
  • Requests may be labelled ‘test only’ or ‘test and production’. Default is production.
  • Clients that have either ‘test’ or ‘test-only’ on requests SHALL indicate clearly to users that this is the case
  • Operations on test resources that use production resources internally on the server (e.g. $expand) are allowed in ‘test only mode’
  • Conformance resource : add a ‘testCompartment’ : boolean flag where true means that the system offers the same services as in production, but in test mode
  • If a system supports a test compartment, it also supports a $clear operation on the test compartment – it clears all the records associated with the test compartment (including audit records), so to reset for an automated test run
  • Testable Systems can be the target of a test engine that executes a set of tests, and which can produce a test report for archival / comparison purposes
  • System administrators can write tests against their system – e.g. clinical safety focused using TestScripts – and these test scripts can cross multiple systems
  • Systems are allowed to make their own arrangements regarding multiple test compartents (e.g. per authorised user)

Anyone with any engineering experience will immediately recognise that implementing this set of requirements is a major design undertaking. It’s not going to happen quickly in existing EHRs, unless s conceptually similar system already exists – but I’ve never seen one.

But if we regularise the interface, and the test servers implement it (not too hard in my server) – then we might encourage movement in this direction gradually, and that would make it cheaper to deliver truly reliable systems….

Profiles and Exceptions to the Rules

One of the key constructs in FHIR is a “profile”. A profile is a statement of how FHIR resources are used for a particular solution – or, how they should be used. The FHIR resources are a general purpose construct, and you can do kind of general purpose things with them, such as store the data in a PHR, and do generally useful display of a clinical record etc.

But if you’re going to do something more specific, then you need to be specific about the contents. Perhaps, for instance, you’re going to write a decision support module that takes in ongoing glucose and HBA1c measurements, and keeps the patient informed about how well they are controlling their diabetes. In order for a patient or an institution to use that decision support module well, the author of the module is going to have to be clear about what are acceptable input measurements – and it’s very likely, unfortunately, that the answer is ‘not all of them’. Conversely, if the clinical record system is going to allow it’s users to hook up decision support modules like this, it’s going to have to be clear about what kind of glucose measurements it might feed to the decision support system.

If both the decision support system and the clinical records system produce profiles, a system administrator might even able to get an automated comparison to see whether they’re compatible. At least, that’s where we’d like to end up.

For now, however, let’s just consider the rules themselves. A clinical record system might find itself in this situation:

  • We can provide a stream of glucose measurements to the decision support system
  • They’ll come from several sources – labs, point of care testing devices, inpatient monitoring systems, and wearables
  • There’s usually one or more intermediary systems between the actual glucose measurement, and the clinical record system (diagnostic systems, bedside care systems, home health systems – this is a rapidly changing space)
  • Each measurement will have one of a few LOINC codes (say, 39480-9: Glucose [Moles/volume] in Venous blood, 41652-9: Glucose [Mass/volume] in Venous blood,
    14743-9: Glucose [Moles/volume] in Capillary blood by Glucometer)
  • the units of measure will be mg/dL or mmol/L
  • there’ll be a numerical value, perhaps with a greater than or less than comparator (e.g. >45mmol/L)

So you can prepare a FHIR profile that says this one way or another. And then a decision support engine can have a feel for what kind of data it might get, and make sure it can handle it all appropriately.

So that’s all fine. But…

Eventually, the integration engineers that actually bring the data into the system discover – by looking at rejected messages (usually) – 1 in a million inbound glucose measurements from the lab contain a text message instead of a numerical value. The message might be “Glucose value to high to determine”. Now what? From a clinical safety perspective, it’s almost certain that the integration engineers won’t replace “too high to determine’ with a “>N” where N is some arbitrarily chosen number – there’s no number they can choose that isn’t wrong. And they won’t be able to get the source system to change their interface either – that would have other knock-on effects for other customers / partners of the source system. Nor can they drop the data from the clinical record – it’s the actual test result. So they’ll find a way to inject that value into the system.

Btw- aside – some of the things that go in this string value could go in Observation.dataAbsentReason, but they’re not coded, and it’s not possible to confidently decide which are missing reasons, and which are ‘text values’. So dataAbsentReason isn’t a solution to this case, though it’s always relevant.

Now the system contains data that doesn’t conform to the profile it claimed to use. What should happen?

  1. The system hides the data and doesn’t let the decision support system see it
  2. The system changes it’s profile to say that it might also send text instead of a number
  3. The system exposes the non-conformant data to the decision support system, but flags that it’s not valid according to it’s own declarations

Neither of these is palatable. I assume that #1 isn’t possible, at least, not as a blanket policy. There’s going to be some clinical safety reason why the value has to be passed on, just the same as the integration engineers passed it on in the first place, so that there’re not liable.

Option #2 is a good system/programmer choice – just tell me what you’re going to do, and don’t beat around the bush. And the system can do this – it can revise the statement ‘there’ll be a numerical value’ to something like ‘there’ll be a numerical value, or some text’. At least this is clear.

Only it creates a problem – now, the consumer of the data knows that they might get a number, or a string. But why might the get a string? what does it mean? Someone does know, somewhere, that the string option is used 1 in a million times, but there’s no way (currently, at least) to say this in the profile – it just says what’s possible, not what’s good, or ideal, or common. If you start considering the impact of data quality on every element – which you’re going to have to do – then you’re going to end up with a profile that’s technically correct but quite non-comunicative about what the data might be, nor one that provides any guidance as to what it should be, so that implementers know what they should do. (and observationally, if you say that it can be a string, then, hey, that’s what the integration engineers will do to, because it’s quicker….)

That’s what leads to the question about option #3: maybe the best thing to do is to leave the profile saying what’s ideal, what’s intended, and let systems flag non-conforming resources with a tag, or wrong elements with an extension? Then the consumer of the information can always check, and ignore it if they want to.

That is, if they know about the flag, and remember. Which means we’d need to define it globally, and the standard itself would have to tell people to check for data that isn’t consistent with it’s claims… and then we’d have to add overrides to say that some rules actually mean what they say, as opposed to not actually meaning that…. it all sounds really messy to me.

Perhaps, the right way to handle this is to have ideal and actual profiles? That would mean an extension to the Conformance resource so you could specify both – but already the interplay between system and use case profiles is not well understood.

I think this area needs further research.

p.s. There’s more than some passing similarity between this case and the game of ‘hot potato‘ I used to play as a kid: ‘who’s going to do have to do something about this bad data’.

FHIR: Useability vs Reusability

I regularly get requests to look at a RESTful API for healthcare data from some company out there. Generally, people ask me one of 3 questions:

  • Should this company use FHIR?
  • Is this compatible with FHIR?
  • Why is FHIR not as simple as this API?

Whenever I look at one of these API’s , 3 basic themes stand out:

  1. These APIs solve a single problem
  2. These APIs can be changed
  3. These APIs are unsafe

1. A single Problem

The first is the biggest point; the company has designed the API to do exactly what they want, and it is perfectly tailored to their solution to that problem. They don’t need to ask painful questions like “what if the institution uses a different coding system?”, or “what if information is structured differently in some cases?”. Companies generally can simply decide what to do, and do it. So their APIs will always be simpler than the equivalent FHIR API, where we have to confront the need to deal with these things. Compared to FHIR, the companies API will be more usable. Well, actually, it will look more useable – and if you write denovo code to integrate with it (e.g. a greenfields client) then it will actually be easier – that’s actually ‘operability’. But if you have an already existing system with a different set of assumptions, it will get very difficult really quickly, or may even be impossible. FHIR, on the other hand, spreads the pain around (as equally as we can) – this means that everyone pays a higher price, but ‘interoperability’ is possible. In this way, the FHIR interface is much more reusable across multiple systems and use cases, at the cost of being less easy to use.

Here’s an example. A company doing diabetes monitoring might say, for instance, that a set of vital signs consists of the following JSON:

{
  "HBA1C" : "[value as %]",
  "CrCl" : "[value as mL/min]"
}

But that FHIR could never adopt this kind of approach – it serves only that single use case, and not a more general one involving lab results.

2. Dealing with Change

Part of the reason that companies can make these APIs simple is because if it turns out that the API is insufficient, they can just decide to extend it. That will work varyingly well for them, depending on how tightly the partner applications are linked to the host application life cycle. The slower it is to upgrade the clients (perhaps you have a 4 week app store cycle to deal with, or a 15 month institutional upgrade approval process to deal with), the less you can rely on fixing things as you go, and the more you have to get it right in advance.

As a standard, FHIR changes much more slowly than that again; that’s both it’s great strength and it’s weakness. So FHIR will be more reusable over time too, but this also must come at the cost of not being as easy to use.

3. Unsafe APIs

This is my final theme. Again and again I see that the underlying clinical content model is too simple; it doesn’t deal with the exceptional clinical cases. 99% of patients – the mainstream – will fit the simply model just fine, but the other 1% – either they cannot use the API, or they have to provide flawed or fictional data, or the data is too simple and subsequent analysis services the company claims to provide will be misleading. FHIR can’t afford to do that – the underlying clinical content model has to be robust.

The single most common way that this appears is in the codes – typically, a company will only allow a simple code system, with no exceptions for text, for instance. I understand this – dealing with text exceptions is one of the most painful parts of dealing with H7  content.

I suppose most of these companies will say that the problems I am concerned about simply don’t arise in practice, and if they did, they can respond by changing their API (see #2), but often, the patients affected are the most sick and most needy, and also the least able to express their position. However, as connected health information and the mobile/social web for health becomes more pervasive, this will become a political issue about access, like disabled access (which is relevant to the apps, but not the APIs). In the meantime, the FHIR API has to work in all these cases; this means that it has to be more complex for the simple case.

Usability and Re-usability are very often in conflict with each other.

p.s. FHIR is known for it’s core data element philosophy (sometimes described as it’s 80% philosophy), and so some readers might think my comments above are surprising. But no, this is an important part of the core rule – while we say that we do what 80% of systems do, we never think in terms of 80% of clinical cases. All clinical cases will flow across FHIR interfaces. (When I write operational clinical systems, my rule of thumb is: “1 in a million cases happen every week” in the systems, since roughly a million cases flow across them every week)

 

Security cases for EHR systems

Well, security is the flavor of the week. And one thing we can say for sure is that many application authors and many healthcare users do not care about security on the grounds that malicious behaviour is not expected behaviour. One example that sticks in my mind is one of the major teaching hospitals in Australia that constantly had a few keys on the keyboard wear out early: the doctors had discovered that the password “963.” was valid, and could be entered by running your finger down the numeric keypad, so they all used this password.

So here’s some examples of real malicious behaviour:

Retrospectively Altering Patient Notes

From Doctors in the Dock:

A Queensland-based GP retrospectively changed his notes to falsely claim that a patient refused to go to hospital – then submitted those notes as evidence in an inquest, has been found to have committed professional misconduct and been suspended from practising medicine for 12 months including six months plus another six if he breached conditions within three years – and banned him from conducting most surgical procedures except under supervision for at least a year.

This is very similar to a case here in Melbourne more than a decade ago, where a doctor missed a pathology report that indicated cancer was re-occurring in a patient in remission. By the time the patient returned for another check-up, it was too late, and she died a few weeks later. The doctor paid an IT student to go into the database and alter the report to support his attempt to place blame on the pathology service. Unfortunately for him, the student missed the audit trail, and the lab was able to show that it wasn’t their fault. (I couldn’t find any web references to it, but it was big news at the time).

Both these involve highly motivated insider attacks. As does the next one, which is my own personal story.

Altering Diagnostic Reports

Late one evening, in maybe about 2001, I had a phone call from the owner of the company I used to work, and he told me that the clinical records application for which I was lead programmer had distributed some laboratory reports with the wrong patients data on them. This, of course, would be a significant disaster and we had considerable defense in depth against this occurring. But since it had, the customer – a major teaching hospital laboratory service – had a crisis of confidence in the system, and I was to drop everything and make my way immediately to their site, where I would be shown the erroneous reports.

Indeed, I was shown a set of reports – photocopies of originals, which had been collected out of a research trial record set, where what was printed on the paper did not match what was currently in the system. For nearly 48 hours, I investigated – prowled the history records, the system logs, the databases, the various different code bases, did joint code reviews, anything. I hardly slept… and I came up with nothing: I had no idea how this could be. We had other people in the hospital doing spot reviews – nothing. Eventually I made my way to the professor running the trial from which the records came, and summarised my findings: nothing. Fail. As I was leaving his office, he asked me casually, “had I considered fraud”? Well, no…

Once I was awake again, I sat and looked afresh at the reports, and I noticed that all the wrong data on the set of reports I had were all internally shuffled. There was no data from other reports. So then I asked for information about the trial: it was a multi-center prospective interventional study of an anti-hyperlipideamic drug. And it wasn’t double-blind, because the side-effects were too obvious to ignore. Once I had the patient list, and I compared real and reported results, I had a very clear pattern: the fake result showed that patients on the drug showed a decrease in cholesterol readings, and patients that were on placebo didn’t. The real results in the lab system showed no significant change in either direction for the cohort, but significant changes up and down for individual patients). For other centers in the trial, there was no significant effect of the drug at the cohort level.

So there evidence was clear: someone who knew which patient was on drug or placebo had been altering the results to meet their own expectations of a successful treatment (btw, this medication is a successful one, in routine clinical use now, so maybe it was the cohort, the protocol or dosage, but it wasn’t effective in this study). How were the reports being altered? Well, the process was that they’d print the reports off the system, and then fax them to the central site coordinating the trial for collation and storage. Obviously, then, the altering happened between printing and faxing, and once I got to this point, I was able to show faint crease marks on the faxes, and even pixel offsets, where the various reports had literally been cut and pasted together.

I was enormously relieved for myself and my company, since that meant that we were off the hook, but I felt sorry for the other researchers involved in the trial, since the professor in charge cancelled the whole trial (inevitable, given that the protocol had been violated by an over eager but dishonest trial nurse).

What lessons are there to be learnt from all this? I know this a tiny case size (3), but:

  • Many/most attacks on a medical record system will come from insiders, who may be very highly motivated indeed, and will typically have legitimate access to the system
  • In all cases, the audit trail was critical. Products need a good, solid, reliable, and well armoured audit log. Digital signatures where the vendor holds the private key are a good idea
  • Digital signatures on content and other IT controls can easily be subverted by going with manual work arounds. Until paper goes away, this will continue to be a challenge

If any readers have other cases of actual real world attacks on medical record systems, please contribute them in the comments, or to me directly. I’m happy to publish them anonymously

CDA Security Issues and implications for FHIR

Overnight, Josh Mandel posted several security issues with regard to CDA:

This blog post describes security issues that have affected well-known 2014 Certified EHRs. Please note that I’ve already shared this information privately with the Web-based EHR vendors I could identify, and I’ve waited until they were able to investigate the issues and (if needed) repair their systems.

Josh identified 3 issues:

  1. Unsanitized nonXMLBody/text/reference/@value can execute JavaScript
  2. Unsanitized table/@onmouseover can execute JavaScript
  3. Unsanitized observationMedia/value/reference/@value can leak state via HTTP Referer headers

So how do these relate to FHIR?

  1. The nonXMLbody attack as described by Josh is an accident in the stylesheet. The real problem is that a nonXMlBody can contain unsanitized HTML of any type, and there’s no way to show it faithfully and securely.Impact for FHIR: none. FHIR already includes native HTML, and active content is not allowed
  2. This and lots of other attacks are possible – though illegal – in FHIR.Impact for FHIR: we need to tighten up the base schema, the reference implementations to reject this content (and also document that this is a wonderful attack vector, and systems must validate the resources)
  3. This is by the far the hardest of the issues to deal with; as an outright attack, any system processing resources must ensure that http headers – including the http referer header – don’t leak when the image is retrieved. But there’s a more pernicious issue – if you author a CDA document, or a resource, you can insert a uniquely identified image reference in the source that means you get notified every time someone reads the document – kind of a doubleclick.net for health records. This is a different kind of attack again.Protecting against the second kind of abuse is only really possible if you whitelist the set of servers that are allowed to host images you will retrieve. My suggested whitelist: empty. Don’t allow external references  at all. In practice, that’s probably too draconianImpact for FHIR: provide advice around this issue. We can’t make it illegal to provide external references, but all implementations should start at that place

Whitelist of allowed HTML elements and attributes in FHIR

I am updating the FHIR xhtml schema, the java and pascal reference implementations, and my server, so that they only accept html elements with these names:

  • p
  • br
  • div
  • h1
  • h2
  • h3
  • h4
  • h5
  • h6
  • a
  • span
  • b
  • em
  • i
  • strong
  • small
  • big
  • tt
  • small
  • dfn
  • q
  • var
  • abbr
  • acronym
  • cite
  • blockquote
  • hr
  • ul
  • ol
  • li
  • dl
  • dt
  • dd
  • pre
  • table
  • caption
  • colgroup
  • col
  • thead
  • tr
  • tfoot
  • th
  • td
  • code
  • samp

The following attributes will be allowed:

  • a.href
  • a.name
  • *.title
  • *.style
  • *.class
  • *.id
  • *.colspan (td.colspan, th.colspan)
  • img.src

Have I missed anything?

HL7 Standards and rules for handling errors

It’s a pretty common question:

What’s the required behavior if a (message | document | resource) is not valid? Do you have to validate it? what are you supposed to do?

That question can – and has been – asked about HL7 v2 messaging, v3 messaging, CDA documents, and now the FHIR API.

The answer is that HL7 itself doesn’t say. There’s several reasons for that:

  • HL7’s main focus is to define what is and isn’t valid, and how implementation guides and trading partners can define what is and isn’t valid in their contexts
  • HL7 generally doesn’t even say what your obligations are when the content is valid either – that’s nearly always delegated to implementation guides and trading partner agreements (such as, say, XDS, but I can’t even remember XDS making many rules about this)
  • We often discuss this – the problem is that there’s no right rule around what action to take. A system is allowed to choose to accept invalid content, and most choose to do that (some won’t report an error no matter what you send them). Others, on the other hand, reject the content outright. All that HL7 says is that you can choose to reject the content
  • In fact, you’re allowed to reject the content even if it’s valid with regard to the specification, because of other reasons (e.g. ward not known trying to admit a patient)
  • We believe in Postel’s law:

Be conservative in what you do, be liberal in what you accept from others

  • HL7 doesn’t know what you can accept – so it doesn’t try to make rules about that.

So what’s good advice?

Validation

I don’t think that there’s any single advice on this. Whether you should validate instances in practice depends on whether you can deal with the consequences of failure, and whether you can’t deal with the consequences of not validating. Here’s an incident to illustrate this point:

We set up our new (HL7 v2) ADT feed to validate all incoming messages, and tested the interface thoroughly during pre-production. There were no problems, and it all looked good. However, as soon as we put the system into production mode, we started getting messages rejected because the clerical staff were inputting data that was not valid. On investigation we found that the testing had used a set of data based on the formally documented practices of the institution, but the clerical staff had informal policies around date entry that we didn’t test. Rejected messages left the applications out of sync, and caused much worse issues. Rather than try to change the clerical staff, we ended up turning validation off

That was a real case. The question here is, what happens to the illegal dates now that you no longer accept them? If you hadn’t been validating, what would have happened? In fact, you have to validate the data, the only question is, do you validate everything up-front, or only what you need as you handle it?

Now consider the Australian PCEHR. It’s an XDS based system, and every submitted document is subjected to the full array of schema and schematron validation that we are able to devise. We do this because downstream processing of the documents – which happens to a limited degree (at the moment) – cannot be proven safe if the documents might be invalid. And we continuously add further validation around identified safety issues (at least, where we can, though many of the safety issues are not things that automated checks can do anything about).

But it has it’s problems too – because of Australian privacy laws, it’s really very difficult for vendors, let alone 3rd parties, to investigate incidents on site in production systems. The PCEHR has additional rules built around privacy and security which make it tougher (e.g. accidentally sharing the patient identifier with someone who is not providing healthcare services for the patient is a criminal offence).  So in practice, when a document is rejected by the pcEHR, it’s the user’s problem. And the end-user has no idea what the problem is, or what to do about it (schematron errors are hard enough for programmers…).

So validation is a vexed question with no right answer. You have to do it to a degree, but you (or your users) will suffer for it too.

Handling Errors

You have to be able reject content. You might choose to handle failed content in line (let the sender know) or out of line (put it in a queue for a system administrator). Both actions are thoroughly wrong and unsafe. And the unsafest thing about either is that they’ll both be ignored in practice – just another process failure in the degenerate process called “healthcare”.

When you reject content, provide both as specific and verbose message as you can, loaded with context, details, paths, reasons, etc – that’s for the person who debugs it. And also provide a human readable version for the users, something they could use to describe the problem to a patient (or even a manager).

If you administer systems: it’s really good to be right on top of this and follow up every error, because they’re all serious – but my experience is that administrative teams are swamped under a stream of messages where the signal to noise ratio is low, but the real problems are beyond addressing anyway.

 

Ineroperability and Safety: Testing your healthcare integration

John Moehrke has a new post up about the importance of testing:

Testing is is critical both early and often. Please learn from others failures. The Apple “goto fail”provides a chance to learn testing governance lesson. It is just one in a long line of failures that one can learn the following governance lesson from. Learning these lessons is more than just knowing the right thing to do, but also putting it into practice.

Starting with the Apple Goto Fail: I was personally astounded that this bug existed for so long. John notes that this is an open-source library, though I think of this as “published source” not “open source”. And btw, NSA, thanks for letting Apple know about the bug when you found out about it – I’d hate to think that you preferred for us all to be insecure…

Anyway, the key thing for me is, why isn’t this tested? Surely such a key library on which so much depends, it’s surely tested every which way until it’s guaranteed to be correct?  Well, no, and it’s not the only security library that has problems – even very similar ones. Though it’s probably properly tested properly by Apple now – or soon anyway (in fact, I figure that’s probably how they found the issue).

The interesting thing about this is how hard this bug is to test for – because it’s actually a bug in the interoperability code. Testing this code in a set of functional tests isn’t exactly hard, it just needs a custom written test server against which to test all the myriad variations for which it must be tested. That is, it’s not hard, it’s just a huge amount of work – and it’s work that programmers hate doing because it’s repetitive variations with little useful functionality, and terribly hard to keep the details straight.

Well, we can poke fun at Apple all we like, but the reality is that interfaces that integrate between different products are rarely tested in any meaningful sense. The only ongoing healthcare interoperability testing I know about that regularly happens in Australia is that the larger laboratory companies maintain suites of GP products so that they can check that their electronic reports appear correctly in them. Beyond that, I’ve not seen any testing at all. (well, of course, we always sometimes to acceptance testing when the interface is first installed).

Interface engines typically include features to allow the integration code – the transforms that run in the interface engine – to be tested, and these things often are tested. But I’m not aware of them providing framework support for running tests that test the integration between two products that exchange information over the interface engine. Testing this kind of integration is actually really hard, because effective test cases are actually real world test cases – they test real world processes, and they need business level support.

And just like programmers have discovered: maintaining test cases is a lot of work, a lot of overhead. It’s the same for organizations doing business level test cases. What I do see almost everywhere is that production systems contain test records (Donald Duck is a favourite name for this in Australia) that allow people to create test cases in the production system; most staff automatically recognise the standard test patients and ignore references to them in worklists, etc. Interestingly, the pcEHR has no such arrangement, and end-users find that a very difficult aspect of the pcEHR – how do they find out how things work? Sure, they can read the doco, but that doesn’t contain the detail they need. In practice, many of the users use their own patient record for experimentation, and I wonder how many of the 15000 real clinical documents the system contains are actually test records against the author’s own patient record.

HL7 v2 contains a field in which “test” messages can be flagged. It’s not intended for use against the test records I discussed in the previous paragraph, but to indicate that the message is intended for test purposes. The field is MSH-11, and in v2.7 it has the following values:

D 1 Debugging
P 2 Production
T 3 Training

I’ve never seen this field used – instead, the test system – in the few cases where it exists – thinks it’s the production system, it’s just a different address, or maybe on an entirely physically separate network. So there’s no equivalent for this in CDA/FHIR.

So, real testing is hard. We’ll continue to get exposed by this, though sometimes it’s simply cheaper to pick up the pieces than prevent accidents – the degree to which that’s true depends on the level of human fall back systems have, but that’s gradually reducing, and it’s really hard to quantify. I don’t think we’re “safe” in this area at all.

What do you think? I’d welcome comments that describe actual testing regimes, or war stories about failures that happened due to lack of testing… I have more than a few myself.

 

Follow up workshops: Clinical Safety for Healthcare Applications

Back in November I ran a clinical safety workshop for the MSIA in Sydney. I had several requests for follow ups in Melbourne and Brisbane, so I will be holding follow up workshops on March 11 (Melbourne) and Mar 13 (Brisbane). I’ll do more if there’s sufficient interest.

Here’s some quotes from the announcement:

This workshop provides a set of tools and knowledge that will help healthcare application designers create safer software. Attendees will work through real, practical examples that demonstrate clinical safety issues in application design, and also cover general clinical safety thinking, coding, presentation, data management issues, and the looming regulatory process.

The workshop is for people who make decisions about software design, or manage the software design process – whether their background is clinical, programming, or otherwise.

Testimonials

“The Clinical Safety Workshop was not what I expected, but so much more. Grahame did not condescend to dictate specific software tricks to write safe software. Rather, he engaged with participants to help them understand for themselves a philosophy and approach to developing safe and effective health software using skills we already know. Suited to senior technical management, architects and code cutters alike.”  Mat Hudson, CEO RadLogix Pty Ltd

Grahame brings together his years of experience in medical software and standards development to address one of the most important issues related to using information and associated machines in the provision of health care and that issue is safety.  The workshop I attended was both academically rigorous and practical. I would not hesitate to recommend it for anyone wanting to work with information and knowledge in health and in my view there is no-one that doesn’t! Michael Legg, Conjoint Associate Professor, School of Medical Sciences, University of New South Wales

The workshop will cost $700, or $500 for employees of companies that are MSIA members (the program was prepared in association with the eHealth Industry Clinical Safety and Security Committee auspiced by the MSIA)

Hopefully, I’ll see you there.

Clinical Safety Workshop for Healthcare Application Designers

On November 12 in Sydney, I’ll be leading a “Clinical Safety for Healthcare Application Designers” workshop on behalf of the Clinical Safety committee of the Australian Medical Software Association (MSIA). This is the blurb that went out to MSIA members today:

Ensuring the clinical safety of any healthcare application is vital – but it’s hard. Not only do the economics of the industry work against it, most of the information available is targeted at the clinical users, and often isn’t relevant or useful to application designers. But it’s the designers who make the important choices – often in a context where they aren’t aware of all the consequences of their actions, and where feedback, if it comes at all, is unreliable and often unusable.

Attendees at the MSIA clinical safety workshop will work through practical examples (often real) to help raise awareness of clinical safety issues in application design, and provide attendees with a set of tools and plans to mitigate the issues. Topics covered include general clinical safety thinking, and identification, classification, presentation, and data access, migration and aggregation issues.

The workshop is sponsored by the MSIA Clinical Safety Committee, and will be lead by Grahame Grieve, who has 20 years of experience in healthcare, application design, information exchange, and also served on the PCEHR clinical safety work group.

Not all MSIA members are on the distribution list, and some of the target audience track my blog, so I thought I’d announce it here. Attendence is restricted to MSIA members, and it’s free. If you’re not on the Bridget’s (MSIA CEO) email list, and you’re interested in attending, send me an email.