Response to Critical Safety Issue for the PCEHR

While I was on leave at Tamboon Inlet (and completely off the grid), Eric Browne made a post strongly critical of CDA on his blog:

I contend that it is nigh on impossible with the current HL7 CDA design, to build sufficient checks into the e-health system to ensure these sorts of errors won’t occur with real data, or to detect mismatch errors between the two parts of the documents once they have been sent to other providers or lodged in PCEHR repositories.

Eric’s key issue is that

One major problem with HL7 CDA, as currently specified for the PCEHR, is that data can be supplied simultaneously in two distinct, yet disconnected forms – one which is “human-readable”, narrative text displayable to a patient or clinician in a browser  panel;  the other comprising highly structured  and coded clinical “entries” destined for later computer processing.

It’s odd to hear the central design tenant of CDA described as a “major problem with CDA”. I think this betrays a fundamental misunderstanding of what CDA is, and why it exists. These misunderstandings were echoed in a number of the comments. CDA is built around the notion of a the twin forms – a human presentation, and a computer processible version. Given this, it’s an obvious issue about how the two relate to each other, and I spend at least an hour discussing this every time I do a CDA tutorial.

Eric complains that clinicians have no way to inspect how the data and the narrative relate, nor is there an algorithm to test this:

However, the critical part of the document containing the structured, computer-processable data upon which decision support  is to be based is totally opaque to clinicians, and cannot be readily viewed or checked in any meaningful way

This is true – and it would be a whole lot more concerning except that this is true of all the forms of exchange we currently have – it’s just that they don’t have any human fall back to act as a fail safe check. Of course, in a perfect world, this wouldn’t be necessary. The data elements would be clearly and unambiguously defined, everyone would agree with them, no one would use anything extra, and all the implementations would be perfect. This is not the world we live in – it’s a pure governance fantasy, but one that some of Eric’s commenters share:

I can’t imagine going into any true IT organisation and proposing storing the same information constructed in two different ways in the same document, and with no computable requirement to have them synchronised (Andrew Patterson)

My initial response was, no, of course not. But in actual fact, this is ubiquitous in health care, and CDA is designed for the reality that we have. Note that CDA is designed for exchange, for the cracks between systems that cannot or might not agree on data fields. People might not like that, but the PCEHR is very much living between the cracks of the existing clinical systems, unless we replace all of them now.

CDA itself doesn’t have much to say about the relationship between the data and the text. It implies that there must be one, but because CDA covers such a wide variety of use cases, CDA itself doesn’t make the rules; instead, the rules are delegated to CDA implementation guides to make comment about this. And a number of the NEHTA implementation guides do exactly that in response to the same concerns Eric expresses.

Back to Eric’s concerns:

Each clinician is expected to attest the validity of any document prior to sharing it with other healthcare providers, consumers or systems, and she can do so by viewing the HTML rendition of the “human-readable” part of the document… However, the critical part of the document containing the structured, computer-processable data upon which decision support  is to be based is totally opaque to clinicians, and cannot be readily viewed or checked in any meaningful way.

Where as now, with HL7 v2, they can’t see it, and can’t attest the validity at all. Instead, they must trust their systems, and there is no level of human to human fall back position at all. With CDA, they still must trust their systems, because they still can’t see the data portion – that is no different. But they also have a level of human to human communication that doesn’t exist with v2. CDA solves this problem, but does not solve the fact that we still have to trust the systems to exchange the data correctly to get computable outcomes (aside: I’m far from convinced that the clinical community wants more than a modicum of computable outcomes at the moment).

Of course, this still leaves the question of whether the data and the narrative agrees with each or not. The important thing that you have to consider in this regard is how do you build a document? How would you actually build a document that contains narrative and data that disagree with each other? Once you start pursuing this question, it becomes clear that a system or clinician that produces CDA documents that disagree between narrative and data have a serious underlying problem. Note the emphasis on clinician there. In a genuine clinical system producing genuine clinical documents, the system can’t prevent clinicians form producing incoherent documents – it’s up to the clinician. It seems to me, from watching the debate, that whether you think that is good depends on whether you’re a clinician or not.

I’ll illustrate this by describing two scenarios.

  1. A (e.g. pathology) system produces CDA entirely from structured data. The document is produced in the background with no human interaction. In this case, how can the narrative and the data disagree? Well, if a programmer or a user misunderstood the definitions or intended/actual usage of the data items.
  2. A (e.g. GP) system generates a CDA document using user selected available data from the patient record using a clinician defined template, loads the section narratives with their underlying data into an editor, and lets the clinician edit the narrative (usually in order to add additional details or clarifications not found in the structured data). In this case, the narrative can disagree from the data if the user updates the document to disagree with their own data.

In either case, there is an underlying problem that would not be detectable at the end-point were only the data provided. CDA can’t solve these problems – but the fact that CDA contains both narrative and text doesn’t create them either

Finally, the actual usefulness of containing narrative and data is unexpectedly illustrated by Eric’s own post, in an example where he appears to think that he’s criticising the presence of both narrative and data.

As an illustration of the sort of problems  we might see arising, I proffer the following. I looked at 6 sample discharge summary CDA documents  provided by the National E-health Transition Authority recently. Each discharge summary looked fine when the human-readable part was displayed in a browser, yet unbeknownst to any clinician that might do the same, buried in the computer-processable part, I found that each patient was dead at the time of discharge. One patient had been flagged as having died on the day they had been born – 25 years prior to the date that they were purportedly discharged from hospital! Fortunately this was just test, not “live” data.

Firstly, Eric shouldn’t have used technical examples provided to illustrate syntax to application developers as if they are also semantically meaningful (most NEHTA examples aren’t due to time constraints, though I’ve done one or two – it’s a lot slower than you think to produce really meaningful examples). But the date of death is actually buried in the portion of CDA that is data only, not narrative. And because Eric chose the wrong stylesheet (not the NEHTA one), his system didn’t know about the date of death, and ignored it. Had CDA actually contained a narrative portion for the header too, this would not have been a problem. Which brings me back to my earlier point: in the world we live in, not everyone shares the same set of data and processes them with no errors etc.

CDA isn’t a perfect specification – nothing is (subject of a series of posts to come) – and it does have it’s own complexity. But the problems aren’t due to containing both narrative and data. Eric says:

I know of no software anywhere in the world that can compare the two distinct parts of these electronic documents to reassure the clinician that what is being sent in the highly structured and coded part matches the simple, narrative part of the document to which they attest. This is due almost entirely to the excessive complexity and design of the current HL7 CDA standard.

This I completely disagree with. The inability to automatically determine whether what is being sent in the highly structured part matches the narrative is not “entirely due” to the complexity of CDA, but is almost entirely due to the problem itself.

Finally, Eric says that

NEHTA should provide an application, or an algorithm,  that allows users to decode and view all the hidden, coded clinical contents of any of the PCEHR electronic document types, so that those contents can be compared with the human-readable part of the document.

Actually, this is a pretty good idea, though I don’t know whether NEHTA can or not (on practical grounds). I guess that we should be able to, since the Implementation Guides will increasingly make rules about what the narrative must do, and we already provide examples based on rendering the structured data in the narrative. But my own experience trying to interpret AS 4700.1 HL7 v2 messages (Australian Diagnostic Reports) suggests that a canonical rendering application is even more necessary for that – but who could define such a beast?



  1. Andrew Patterson says:

    Well I don’t think I am living in some sort of governance fantasy land. I understand this is all very hard and I certainly don’t have answers.

    But I think it’s clear that when faced with a large problem of general(?) algorithms for rendering structured content hl7 punted on the issue, stuck in a couple of sentences reminding programmers to make sure the two forms kind of match, and here we are.

    I really should note, I have no problem with the narrative form – I understand why a ‘break out’ mechanism of communication is needed between some systems. But if you have structured data the narrative form should be computed directly from the structured data – and contain nothing else.

    It’s the narrative form with random bits of structure behind it that I think is a recipe for disaster.

  2. Grahame Grieve says:

    Hi Andrew

    Let’s be clear on this: HL7 didn’t punt on the problem. For most of HL7 members, CDA was an irrelevance, a curio, one that was mainly a distraction from the real game, v3 – and that was intended to work after your own heart. But the market spoke, much to HL7’s consternation.

    As for only generating from the structured data, that’s fine and often how it’s done where the structured data is everything (and often includes in the NEHTA adoption context) (and it’s sometimes specified in the NEHTA implementation guides). But it’s not always appropriate and therefore not always required either. For instance, pathology report sections, ETP documents – these narratives should be auto-generated. But a clinical letter? no.

    Finally, random bits of data – yes, CDA allows this, but this is not how real people actually build solutions.

    • Andrew Patterson says:

      I dont think v3 works after my own heart either but point taken about how cda ended up where it is.

      For the clinical letter , it should have one free typed narrative section – and then another section with any structured content computably transformed into narrative form. But not a hybrid section. (I haven’t looked how NEHTA are doing the specialist letters so maybe this is how they do it?)

      • Grahame Grieve says:

        why shouldn’t a system that is capable of producing relevant and coherent structured data for a narrative be able to do so? Must you prevent a hybrid section because you don’t think it’s possible?

        In practice, what you describe is how most implementations are actually being developed for the clinical letter work, because the problem of being confident about the data/narrative is a big problem to solve. And I’m not arguing that it’s easy – just that the narrative/data thing is a distraction from the real issues

  3. Thomas Beale says:

    Hi Grahame,
    I am confused. The first part of this post implies that the same problem exists ubiquitously in healthcare so it’s ok with CDA. I am not aware of this as a ubiquitous problem – I know that HL7v2 has a display segment that can carry a displayable form of structured data from elsewhere in the message (Andrew MacIntyre justifiably complains about this all the time). Ok, so some broken HL7v2 messages have this problem. Not all do, and there are Gb of health data carried around by other means. Other than that, I can’t think of any data structure that has this dual format concept.

    In any case, I am not aware that most HL7v2 message content is attested by clinicians in the way that Nehta CDAs are supposed to be. A lot of HL7 message content comes out of machines – humans aren’t routinely checking every message.

    It seems to me difficult to argue for a new system having a non-checkable dual format representation of the same information when there is no formal relationship specified between the two formats, nor correctness conditions stated generally for what the two parts of a CDA should contain – regardless of how bad some existing ehealth data might be today.

    Lastly, why wouldn’t a clinical letter be just a text atom in the structured part, and the narrative part just be blank, or contain a guaranteed copy?

  4. Grahame Grieve says:

    hi Tom

    The problem that exists ubiquitously in healthcare is disagreements over data, poor data, lack of agreement etc. One solution to this is the CDA R2 path. As discussed above, the market adoption of CDA came as quite a surprise to many people.

    Other examples – one that immediately comes to mind is HTML microdata. And, of course, v2 use in Australia, which has come to this point as a pragmatic solution to the problems we have.

    Obviously, in openEHR, your approach is to get agreement to a single data model. But when it comes to something like pcEHR, that’s just governance fantasy. It’s not going to happen.

    I agree that a checkable algorithm for the text would be a good thing – except that if you had that much control over the text, why include it in the first place? Just generate it when you need it, and everyone follows the same algorithm. The fact is that you might as well ask for an algorithm to check that and openEHR archetype has been populated correctly. (note, not that it’s valid, but that it’s actually been used properly). Why don’t you have an algorithm for that?

  5. I commented on this a few months ago on my blog. There certainly is a way to associate each text element in the narrative and ensure that it has associated machine readable data.

    I know of several applications that are capable of generating machine readable data from the narrative, especially in the imaging sector.

    In the US, many systems generate the narrative from the machine readable data.

    In both of these cases, the source systems don’t have any problems ensuring that the two components are aligned.

  6. Thomas Beale says:

    Grahame: CDA also has a ‘single data model’ – it’s just a different one. Why aren’t narrative segments just text atom(s) within it? In openEHR, that’s how it is.

    Keith: so currently the machine generation is custom / specific thing? If there are good solutions here, surely they need to become mainstream and part of the CDA spec? I don’t see any alternative if people are really thinking of using CDA for structured data.

  7. Thomas Beale says:

    Actually I have realised/remembered why the CDA is the way it is. It originally came from SGML document thinnking (content = marked up text), and then a separate structured section was added by ‘data’ oriented people (content = a data structure). These two sections have never been consolidated.

  8. Eric Browne says:

    Grahame: Why do you make statements like “And because Eric chose the wrong stylesheet (not the NEHTA one), his system didn’t know about the date of death, and ignored it. “? I did not choose a stylesheet at all! I used the stylesheet bundled with the sample discharge summaries as supplied by NEHTA. The ?xml-stylesheet element was already included! NEHTA supplied samples with and without the stylesheet reference. You may recall in the first letter I wrote, back in November ( you were the only person who bothered to reply ), that I was also critical of the presentation of the 16digit IHI in the same sample discharge summaries, as rendered by the supplied NEHTA stylesheet.

    Let’s put aside for one moment the question of whether the schizophrenic nature of CDA should be described as a “design flaw” or not.

    I’d like to ask you whether you consider that there are sufficient potential safety issues in its application as the basis for the PCEHR to warrant caution (if not serious investigation), or whether you would recommend steaming ahead with the PCEHR rollout unconcerned with the issues I raised?

    • Grahame Grieve says:

      Eric, my apologies on the spreadsheet issue. But I know that the NEHTA spreadsheet displays date of death. Perhaps this is a version not yet released. (I do try to only comment on released content)

      With regard to steaming ahead with the PCEHR roll out, I do not believe that your issue with CDA is one of the reasons not to proceed. It is a well known issue and a regular feature of CDA implementation discussions, including with nearly every implementer, and the clinical units. Note that I am not saying that no caution is needed – of course it is, as it would be were any other approach being used.

  9. Eric Browne says:

    Grahame: you write “I agree that a checkable algorithm for the text would be a good thing – except that if you had that much control over the text, why include it in the first place? Just generate it when you need it, and everyone follows the same algorithm.”

    This is the crux of the problem for CDA-based documents. It is so excruciatingly difficult to generate the text from the coded entries, that most systems would not have the capability to do so. Can you cite a single system in Australia that could go close to so doing? Not only would it require access to all the terminologies for all of the coded values, but it would require access to a vast array of other codesystems to dereference all the other codes riddled throughout each document. How could the local oncology system of a hospital possibly decode the structured part of a NEHTA CDA share health summary document? How would it know what code
    103.16044.4.1.1 from codesystem meant? Or code “W” from codesystem ? How about code “4” from codesystem ? Or code “4” from codesystem 2.16.840.1.113883.3.879 ?

    Or the very likely to be encounted code 103.16302.120.1.2 from codesystem with an associated value of “1” from codesystem ? Is “1” actually a legal value from codesystem by the way? How do I find out what the values of codesystem are?

    • Grahame Grieve says:

      I completely don’t understand why you think this is a problem for CDA based documents? This is a central problem for any data exchange format. CDA at least eases it by providing a human readable form.

      As for the terminology issues – that’s a completely separate question. It would be somewhat palliated by having everyone using the same coding system, but there’s an awesome resistance to any movement in that direction by almost all clinicians. In the absence of that, interoperability is crippled. Hence human readable content….

      • Eric Browne says:

        It is a problem for CDA based documents precisely because CDA provides its human readable form, which is continually used as a get out of gaol card for nearly every implementation. I guarantee it will be used thusly by NEHTA come July 1. Please don’t go blaming clinicians for poorly engineered infrastructure and a lack of an effective conformance regime. Fundamental data like drug names, or names of pathology tests, can and should be standardised fairly readily. I see no evidence that clinicians are impeding that process. Likewise for the administrative terms that come out of METeOR, such as patient gender or indigenous status. The codes for these are a pain, and NEHTA hasn’t even been able to get these simple things correct in the CDA specs. The NEHTA stylesheet supplied with the Discharge Summary CCA bundle in december 2011 doesn’t process the standard gender codes. METeOR is not a codesystem with unique codes, so wherever that OID used in the NEHTA CDA specs it is clearly rubbish. There would be hundreds of terms with code “4” under that OID!
        HL7 v2’s excessive use of codes makes implementation difficult, emphasised by the lack of proper code table management and distribution infrastructure. There has been virtually no support in Australia for a conformance regime, particularly in primary care.
        With CDA, the RIM-based entries take this implementation complexity to new heights, whilst at the same time making it easier to produce an independent, human readable display.

        • Grahame Grieve says:

          Eric, it’s hard to know what to say. You really think that I’d assign a single OID to all of meteor? Why didn’t you actually check?

          If it’s so easy to standardise names of drugs and pathology tests, why hasn’t it happened?

          • Eric Browne says:

            I never said that YOU had assigned a single OID to all of METeOR. Do you actually think I didn’t check. According to the HL7 OID registry, OID 2.16.840.1.113883.3.879 was registered by Paul Watt. Its symbolic name is registered as “meteor”. Its full name is registered as “METeOR”.

          • Grahame Grieve says:

            Ahh. Well, that was me, though I didn’t get my name on it. That is the *root* OID assigned to Meteor. Below that, each Meteor table has it’s own OID – i.e. 2.16.840.1.113883.3.879.1 for Meteor table 000001. I think it should’ve been documented better

          • Eric Browne says:

            I should have been more expansive about the METeOR OID. There is nothing wrong with Paul’s assignment of an OID to the whole of METeOR. What is wrong is NEHTA’s use of that OID for identifying a codesystem in any of the CDA specifications. It has incorrectly done so in the Shared Health Summary, and probably others (but I haven’t checked).

          • Grahame Grieve says:

            oh dear. That’s a bad mistake, and it’s in all of them, and I missed it in the (in)famous quality review. That’ll be fixed in the next release. You find anything else like that?

          • Grahame Grieve says:

            btw, the OID for indigenous status should be 2.16.840.1.113883.3.879.291036

    • Grahame Grieve says:

      oh, and, btw, I could write a system that would generate correct narrative for any *fully* conformant CDA document – that is, a document that contains only those fields defined by NEHTA implementation guides, as long as displayName was properly populated on all the coded data types.

      But this is of limited usefulness, since most CDA documents will include additional other data that the NEHTA process has not modeled. (I suspect that the most of the GP desktop vendor implementations will stick to the specifications fairly closely, but not the wave sites, by and large – this is due to the difference between providing infrastructure and solving problems)

      • Michael Lawley says:

        Actually, I think this might have more value than you suggest. I believe the existence of a mechanism to render the structured data in a standard way could acts as a positive feedback mechanism for implementers who could use such a thing as a sanity check/litmus test that what they have implemented actually does what they think it does.

      • Peter Jordan says:

        Unless ALL of the CDAs passed to the PCEHR are 100% compliant with the NEHTA IGs, surely the whole concept will be compromised? Some of the issues raised by Eric might be resolved if those IGs (and a supporting API/Toolkit?) mandated that (non-null) coded data must contain either a populated displayName and/or the originalText.

  10. Grahame Grieve says:

    Hi Peter

    They have to be 100% compliant in some ways, and not others, in order to allow this behaviour. We do strongly encourage displayName and/or originalText, and for some important clinical codes, originalText is required.

Leave a Reply

Your email address will not be published. Required fields are marked *

question razz sad evil exclaim smile redface biggrin surprised eek confused cool lol mad twisted rolleyes wink idea arrow neutral cry mrgreen


%d bloggers like this: