Monthly Archives: February 2013

Do I need to render the CDA Narrative?

In a couple of different CDA implementation contexts, the same question has come up: does an application receiving the CDA document need to render the document using the narrative, or can it simply ignore the narrative and process and display the structured data? If it can, under what conditions is that OK?

As a background, CDA is a document that has a narrative presentation, a header that defines the clinical document, and which may include some structured data. By the original intent of CDA, it’s very much a document, and should be thought of as a standard way of doing a word document with embedded data, like this example from my usual CDA tutorial:


So ignoring the narrative is inherently a non-CDA type of thing to do. The anticipation in the CDA specification itself is that it’s alright to extract the data from the CDA document for other use, but once you’ve done that, it’s no longer part of the attested content of the document. The implication is that whenever you extract data, you should always retain a link to the source document, so that a user can see the original data in it’s context. For instance, the source system/clinician may have noted some qualifying information about the data in the narrative that is relevant to it’s interpretation.

However CDA is used in all sorts of contexts, some of them extremely data-centric. In practice, there are some uses where the document is pure human-written narrative, some where the CDA documents are pure data, and the constructed narrative is only a formality to satisfy the CDA specification itself, and others that are a mix, both in terms of implementations being more or less data driven, and different parts of the documents using different combinations. In some of these uses, it’s safe and even normal to ignore the narrative.

Given the diagram above, when is it ok to ignore the narrative? When:

  • The authoring application populates the structured data completely and knows that the narrative says nothing additional
  • The receiving application is able to correctly process all the structured data
  • The receiving application is able to know the correct way to display the data

So it’s a collaborative effort between the author and the receiver.

Given the wide flexibility of the entries, and the data types they use, I believe that the only way that a receiving application can be sure that it is able to process all the structured data correctly, and know the correct way to present it, is where there is a tight implementation guide specifying exactly what data elements the CDA document can contain, how these are to be understood, and that there is strict and reliable conformance checking regime in place. The authoring application may know that the narrative is generated, or it may not. (the most common reason why it may not know is that the CDA document is being built by middle ware – a green CDA approach – and the narrative is an independent input of uncertain source).

So how can the rendering application know that the narrative doesn’t contain anything not in the structured data?


The answer to this question is found in the typeCode on each entry. Here’s the relevant part of the CDA RMIM:


A section has narrative (the “text”), one or more entries which have typeCode attributes, and nested sections. The possible values for typeCode are:

COMP (component) [default] The associated entry is a component of the section. No semantic relationship is implied.
DRIV (is derived from) The narrative was rendered from the CDA Entries, and contains no clinical content not derived from the entries.

So you can tell, from the entries in the section, whether the narrative was generated from the entries, and doesn’t contain any other data.

The typeCode=”DRIV” attribute is a little difficult to interpret. Here’s some notes about how to understand this:

  • If there’s no entries, and the section contains text, then the text is not generated from the entries (pretty obvious)
  • If there’s one or more entries, and any of them claim typeCode=”DRIV”, then the narrative is entirely generated from the entries
  • If some of the entries don’t have a type code, or the typeCode=”COMP”, then those entries aren’t represented in the narrative
  • There’s no way to indicate that an entry is represented in the narrative unless you claim that the entire narrative for the section is generated
  • The implication is that the entries relate to the section that contains them, though this is never explicitly stated

Here’s some example fragments:

<!-- a section with autogenerated text -->
  <text>The patient has no problems</text>
  <entry typeCode="DRIV">
   <!-- an observation containing an assertion that there is no problems -->

 <!-- a section with human modified narrative -->
    [a table of the patient's problems, built by application] 
    <!-- additional text added by a human: ->
    <p>The patient also has renal failure</p>
  <entry typeCode="DRIV">
   <!-- an observation with a problem -->
  <entry typeCode="DRIV">
   <!-- an observation with a problem -->

 <!-- a section with autogenerated text -->
  <text>The patient has no problems</text>
  <entry typeCode="DRIV">
   <!-- an observation containing an assertion that there is no problems -->
   <!-- an act with audit information that's not in the narrative -->

So, in theory, you don’t need to render the narrative if all sections in the CDA document have no <text> element, or they have at least one entry with typeCode=”DRIV”. For this purpose, you can ignore sections that contain other sections and no text of their own (though if they have no text, and nested sections, it may be necessary to take note of their title).

Lloyd McKenzie has kindly contributed an XPath predicate for this test:

    component/section[text and not(entry[@type='DRIV'])])]]

Practical Considerations

Note that I said that in theory this test works. However, in practice, there’s a number of problems associated with this:

  • The CDA rules about typeCode are not very prominent (see section, and are often overlooked
  • The CDA rules are not well documented, and even when documented more clearly, people find them very opaque
  • For a variety of reasons, the division between sections isn’t as clear as everyone would like, and entries sometimes end up in more than one section narrative, or in the wrong section for a different reason
  • Even amongst CDA implementation guide writers, the correct use and/or impact of typeCode=”DRIV” is easily overlooked.  In fact, the implementation guides I have contributed to have been wrong on this in the past (NEHTA guides).
  • Even if the CDA IG writers get it right, they may not anticipate the real world usage correctly, or they may provide an example which people copy without understanding (see the CCDA examples which include typeCode=”DRIV” on some examples and not others with no explanation)

What this means is that in practice, depending on typeCode=”DRIV” is unreliable due to poor compliance with the specification in this regard.

And in effect, then, you can only ignore the narrative and render the data if you’re really confident in the CDA implementation guide, and the conformance process associated with it’s implementation.

Explaining to the users what is going on

Ideally, an application would differentiate to the users between “displaying the data in the document” and “displaying the document”. Ideally, also, the users would actually understand the difference. But that’s not the world we live in. I think that this problem underlines the dangers of not rendering the narrative – you need to be really confident before you make that decision.


Question: Interpretation of multiple pre-conditions in CDA


Since a CDA R2 clinical statement can be associated with zero or more “Criterion” through precondition, what would happen in case there are TWO Criterions, one is TRUE and other is FALSE? What should be the default operator in order to arrive at the final decision point whether a service/activity should be performed or not? Should it be allTrue (AND) or atLeastOneTrue(OR)? or in the context of a specific IG, we can make our own statements as CDA is silent on this? I see that HQMF specification explicitly states AND, OR and other operators however this is not the case for CDA.


A clinical statement in CDA R2 can be associated with zero or more “Criterion” through “precondition”:

CDA defines precondition like this:

“The precondition class, derived from the ActRelationship class, is used along with the Criterion class to express a condition that must hold true before some over activity occurs.”

This means that the precondition must be true for a service/activity to be performed.


There’s a property defined on the RIM ActRelationship (which is the base class underlying the CDA precondition) called conjunctionCode, which is defined as:

A code specifying the logical conjunction of the criteria among all the condition-links of Acts (e.g., “and”, “or”, or “exclusive-or”)

So if this attribute were present on the CDA precondition class that would answer your question. However, as you can see, the CDA precondition class has no such attribute as “conjunctionCode” – although it inherits it from the RI, the value has been fixed to the nullFlavor “No Information”,  and so it doesn’t need to be shown on the diagram. And therefore, technically, in CDA, you can’t evaluate the meaning of multiple -pre-conditions. But that’s being pedantic. In most cases, where a committee rules out the use of an attribute like that, there’s an implicit default value.

And it turns out that this is the case in CDA. When the CDA specification says, in text:

“The precondition class, derived from the ActRelationship class, is used along with the Criterion class to express a condition that must hold true before some over activity occurs.”

That answers the question: the condition must hold true. And so if you state more than one condition, then each of them is a condition that must hold true – so the implicit default conjunctionCode is “AND”.

So the direct answer to the question: they must all be true.


Further, by the rules of CDA extension, you can’t effectively change this. The rules say:

Extensions should not change the meaning of any of the standard data items, and receivers must be able to safely ignore these elements

You can’t add conjunctionCode as an extension without changing the meaning of the standardised data elements (or, more strictly, you could add it, but the only value you could use would be “and”). Adding it by a different name wouldn’t make any difference either.

But if you really wanted to do this, and the most you could do is add the or-constraints themselves as extensions – but then you have to somehow choose which preconditions a general CDA process would be aware of – which sounds quite unsatisfactory to me (and you can’t – legally, at least – simply say that you can ignore the general CDA processor case either).

This is the kind of case where it’s very difficult to extend CDA and get it to do what you want.

Question: More on CDA/CCD Identifiers


A follow up question to your articles about how to fill out Identifiers:

Typical to a software development, we take unique ids very seriously, so when you say Identifiers have to be globally unique, and also indicate that you can put the internal system patient id (for example) in the extension part of the id, i have the following questions:

1) Why would anyone outside of your system care about the patient id that is unique to your system?

2) For the document level Id, assuming your system outputs CCDs, is it the case that for patient X, every time you create a CCD for that patient the document ID should be the same? Or should it be different with each instance of the CCD that is created, such that two CCD documents should NEVER have the same ID, even if it’s the same patient.

3) Can you just plug in a non-registered UUID (universally unique ID) into every place one an ID is required and call it good? Or is it better practice(or industry standard) to go get a registered OID (or UUID) and create some kind of hierarchical numbering scheme in the document.


  1. Well, the answer to why someone outside your system would care about your own patient Id depends on how closely their system is bound to yours. If they receive a stream of documents, then they are interested in your patient identifier to help match previous documents in some kind of EMPI algorithm. But let’s assume the least coupled case: they are randomly encountering this document in an XDS repository for some region to which your system is only one of many (1000s maybe) that contribute documents. In this case, a target system itself is not going to make anything of your source id – but it will probably display that to the user, so that if the user chooses to contact a source clinician about the document, they can offer them a patient id. It can also be used by system administrators to debug problems. Both of these use cases are small but real.
  2. Every time you create a CCD (or any CDA document) for a patient, the document ID must be different. Even if you re-issue the same document with a minor amendment, the id must be different: two CCD documents should NEVER have the same ID (it is the primary key for the document on any document indexing system)
  3. You can plug in a non-registered UUID in most places where an ID is required. The exception is codes that you are putting in your document. Any place you use a codeSystem attribute, whatever goes in there should be registered (and OIDs are more polite than UUIDs, but not required). But for any other ID in the document, registration is not required. However it’s polite and useful to register it anyway


Question: Complex reference ranges for observations in CDA


I am wondering if it is possible to include a set of static reference data inside a CDA document, for example WHO percentile information for weight, as structured contents? I am not able to find a proper method to cater for this based on the CDA R2 R-MIMM diagram.


I’m afraid that the answer is not very encouraging. This content would definitely go in the reference range section, which looks like this:

To include the data in structured form, we would have to either use the data structures provided, or extensions. First, notes about the existing fields on Observation Range:

  • classCode – the type of class that this is. Effectively fixed to an observation, and none of the possible values relate to this discussion
  • moodCode – a fixed value which is irrelevant for this discussion
  • code – the kind of observation that this is. The code is “what type of observation this is”, and it’s since the observation is a criterion on weight, that’s what the code should be.
  • text – a free text description of the reference range
  • value – a value that specifies the actual reference range
  • interpretationCode – a code that explains how the reference range should be interpreted. This can only be one of a limited set of codes

The following values are allowed for interpretation code:

  • B     better
  • D     decreased
  • U     increased
  • W     worse
  • <     low off scale
  • >     high off scale
  • A     Abnormal
  • AA     Abnormal alert
  • HH     High alert
  • LL     Low alert
  • H     High
  • HH     High alert
  • L     Low
  • LL     Low alert
  • N    Normal
  • I     intermediate
  • MS     moderately susceptible
  • R     resistent
  • S     susceptible
  • VS     very susceptible

Few of these codes – which primarily relate to the interpretation of a value, not a reference range are useful. I’d only count N, A, and AA as relevant. The percentile data is nearest to “Normal” but I don’t think that really covers something that has 1% and 99% values properly.

So, what are the options for including the percentile information as structured data?

  • It is possible to include this information in the text of the observation range, but that doesn’t count as “structured contents”, and it makes it pretty much impossible for a UI to do anything clever with it. Still, I’ve shown an example of this below.
  • We could include it as a series of reference ranges with a made up code that combines the age in months and the percentile, something like “5m-1%” as the code, and with the value a simple PQ (“5.2” in this case). This is kind of technically misusing code – though it certainly wouldn’t be the only case by a long shot (in fact, the only way you can use code is effectively to misuse it). Still, it doesn’t work very well in this case: the codes would be an arbitrary invented set of codes, and who would know how to interpret them? The display is likely to be misleading or wrong from a system that doesn’t know them
  • we could add the information as a set of extensions using some XML of personal choice in a foreign namespace. This would work, though extensions are unwelcome for many CDA adopters. If you did this, you should really include the text for the sake of systems that don’t understand the extensions – which is why I show the example below, as a base for extending
  • It would also be possible to put the structured data straight in as the content of the text, with some media type to indicate what type it has. I guess it would be text/xml + any xml of your choice to represent the the content. But this is likely a bad idea since most systems would not interpret it correctly, and it’s not really consistent with the idea of a text representation of the reference range

None of those are good options, I’m afraid.

Note for v3 experts: The CDA model is just too stripped down here. The Lab RMIM defines criterion on the reference range, but you can’t introduce that as an extension to CDA (clearly qualifies the reference range), but it includes some problems of it’s own, and the correct interpretation codes are still missing.

Example with the percentiles as text:

  <observation classCode="OBS" moodCode="EVN">
    <id root="$UUID"/>
    <!--SNOMED CT code for height -->
    <code code="$Code" codeSystem="$codeSystem" codeSystemName="$codeSystemName" displayName="Body Weight"/>
    <statusCode code="completed"/>
    <effectiveTime value="20120913"/>
    <!-- Weight -->
    <value unit="kg" value="5.30" xsi:type="PQ"/>
Weight-for-age GIRLS: Birth to 5 years (percentiles), weight in kg

Y: M  Month   1st   3rd   5th   15th  25th  50th  75th  85th  95th  97th  99th 
0: 0  0        2.3   2.4   2.5   2.8   2.9   3.2   3.6   3.7   4.0   4.2   4.4 
0: 1  1        3.0   3.2   3.3   3.6   3.8   4.2   4.6   4.8   5.2   5.4   5.7 
0: 2  2        3.8   4.0   4.1   4.5   4.7   5.1   5.6   5.9   6.3   6.5   6.9 
0: 3  3        4.4   4.6   4.7   5.1   5.4   5.8   6.4   6.7   7.2   7.4   7.8 
0: 4  4        4.8   5.1   5.2   5.6   5.9   6.4   7.0   7.3   7.9   8.1   8.6 
0: 5  5        5.2   5.5   5.6   6.1   6.4   6.9   7.5   7.8   8.4   8.7   9.2 
0: 6  6        5.5   5.8   6.0   6.4   6.7   7.3   7.9   8.3   8.9   9.2   9.7 
0: 7  7        5.8   6.1   6.3   6.7   7.0   7.6   8.3   8.7   9.4   9.6  10.2 
0: 8  8        6.0   6.3   6.5   7.0   7.3   7.9   8.6   9.0   9.7  10.0  10.6 
0: 9  9        6.2   6.6   6.8   7.3   7.6   8.2   8.9   9.3  10.1  10.4  11.0 
0:10  10       6.4   6.8   7.0   7.5   7.8   8.5   9.2   9.6  10.4  10.7  11.3 
0:11  11       6.6   7.0   7.2   7.7   8.0   8.7   9.5   9.9  10.7  11.0  11.7 
1: 0  12       6.8   7.1   7.3   7.9   8.2   8.9   9.7  10.2  11.0  11.3  12.0 
1: 1  13       6.9   7.3   7.5   8.1   8.4   9.2  10.0  10.4  11.3  11.6  12.3 
1: 2  14       7.1   7.5   7.7   8.3   8.6   9.4  10.2  10.7  11.5  11.9  12.6 
1: 3  15       7.3   7.7   7.9   8.5   8.8   9.6  10.4  10.9  11.8  12.2  12.9 
1: 4  16       7.4   7.8   8.1   8.7   9.0   9.8  10.7  11.2  12.1  12.5  13.2 
1: 5  17       7.6   8.0   8.2   8.8   9.2  10.0  10.9  11.4  12.3  12.7  13.5 
1: 6  18       7.8   8.2   8.4   9.0   9.4  10.2  11.1  11.6  12.6  13.0  13.8 
1: 7  19       7.9   8.3   8.6   9.2   9.6  10.4  11.4  11.9  12.9  13.3  14.1 
1: 8  20       8.1   8.5   8.7   9.4   9.8  10.6  11.6  12.1  13.1  13.5  14.4 
1: 9  21       8.2   8.7   8.9   9.6  10.0  10.9  11.8  12.4  13.4  13.8  14.6 
1:10  22       8.4   8.8   9.1   9.8  10.2  11.1  12.0  12.6  13.6  14.1  14.9 
1:11  23       8.5   9.0   9.2   9.9  10.4  11.3  12.3  12.8  13.9  14.3  15.2 
2: 0  24       8.7   9.2   9.4  10.1  10.6  11.5  12.5  13.1  14.2  14.6  15.5 
2: 1  25       8.9   9.3   9.6  10.3  10.8  11.7  12.7  13.3  14.4  14.9  15.8 
2: 2  26       9.0   9.5   9.8  10.5  10.9  11.9  12.9  13.6  14.7  15.2  16.1 
2: 3  27       9.2   9.6   9.9  10.7  11.1  12.1  13.2  13.8  15.0  15.4  16.4 
2: 4  28       9.3   9.8  10.1  10.8  11.3  12.3  13.4  14.0  15.2  15.7  16.7 
2: 5  29       9.5  10.0  10.2  11.0  11.5  12.5  13.6  14.3  15.5  16.0  17.0 
2: 6  30       9.6  10.1  10.4  11.2  11.7  12.7  13.8  14.5  15.7  16.2  17.3 
2: 7  31       9.7  10.3  10.5  11.3  11.9  12.9  14.1  14.7  16.0  16.5  17.6 
2: 8  32       9.9  10.4  10.7  11.5  12.0  13.1  14.3  15.0  16.2  16.8  17.8 
2: 9  33      10.0  10.5  10.8  11.7  12.2  13.3  14.5  15.2  16.5  17.0  18.1 
2:10  34      10.1  10.7  11.0  11.8  12.4  13.5  14.7  15.4  16.8  17.3  18.4 
2:11  35      10.3  10.8  11.1  12.0  12.5  13.7  14.9  15.7  17.0  17.6  18.7 
3: 0  36      10.4  11.0  11.3  12.1  12.7  13.9  15.1  15.9  17.3  17.8  19.0 
3: 1  37      10.5  11.1  11.4  12.3  12.9  14.0  15.3  16.1  17.5  18.1  19.3 
3: 2  38      10.6  11.2  11.6  12.5  13.0  14.2  15.6  16.3  17.8  18.4  19.6 
3: 3  39      10.8  11.4  11.7  12.6  13.2  14.4  15.8  16.6  18.0  18.6  19.9 
3: 4  40      10.9  11.5  11.8  12.8  13.4  14.6  16.0  16.8  18.3  18.9  20.2 
3: 5  41      11.0  11.6  12.0  12.9  13.5  14.8  16.2  17.0  18.6  19.2  20.5 
3: 6  42      11.1  11.8  12.1  13.1  13.7  15.0  16.4  17.3  18.8  19.5  20.8 
3: 7  43      11.3  11.9  12.2  13.2  13.9  15.2  16.6  17.5  19.1  19.7  21.1 
3: 8  44      11.4  12.0  12.4  13.4  14.0  15.3  16.8  17.7  19.3  20.0  21.4 
3: 9  45      11.5  12.1  12.5  13.5  14.2  15.5  17.0  17.9  19.6  20.3  21.7 
3:10  46      11.6  12.3  12.6  13.7  14.3  15.7  17.3  18.2  19.9  20.6  22.0 
3:11  47      11.7  12.4  12.8  13.8  14.5  15.9  17.5  18.4  20.1  20.8  22.3 
4: 0  48      11.8  12.5  12.9  14.0  14.7  16.1  17.7  18.6  20.4  21.1  22.6 
4: 1  49      11.9  12.6  13.0  14.1  14.8  16.3  17.9  18.9  20.6  21.4  22.9 
4: 2  50      12.1  12.8  13.2  14.3  15.0  16.4  18.1  19.1  20.9  21.7  23.2 
4: 3  51      12.2  12.9  13.3  14.4  15.1  16.6  18.3  19.3  21.2  22.0  23.5 
4: 4  52      12.3  13.0  13.4  14.5  15.3  16.8  18.5  19.5  21.4  22.2  23.9 
4: 5  53      12.4  13.1  13.5  14.7  15.4  17.0  18.7  19.8  21.7  22.5  24.2 
4: 6  54      12.5  13.2  13.7  14.8  15.6  17.2  18.9  20.0  22.0  22.8  24.5 
4: 7  55      12.6  13.4  13.8  15.0  15.8  17.3  19.1  20.2  22.2  23.1  24.8 
4: 8  56      12.7  13.5  13.9  15.1  15.9  17.5  19.3  20.4  22.5  23.3  25.1 
4: 9  57      12.8  13.6  14.0  15.3  16.1  17.7  19.6  20.7  22.7  23.6  25.4 
4:10  58      12.9  13.7  14.2  15.4  16.2  17.9  19.8  20.9  23.0  23.9  25.7 
4:11  59      13.1  13.8  14.3  15.5  16.4  18.0  20.0  21.1  23.3  24.2  26.0 
5: 0  60      13.2  14.0  14.4  15.7  16.5  18.2  20.2  21.3  23.5  24.4  26.3 

Note: no claim about interpretationCode in this example.

Problems with PBS codes

The Australian PBS codes are used to describe the medications for which the Australian government offers rebates under the Australian Pharmaceutical Benefits Scheme. The codes are published as part of the rules definitions tables, and can be found here. In living memory, the codes have consisted of 4 digit codes followed by a alphabetical check letter. For instance, the code for a particular packaging of Simvastatin (used to be a clinical interest of mine) is “2011W”, and you can get the full details for this code at (note that this is not case sensitive, and also works).

However, the 4 digit codes are starting to run out. As I previously discussed, the PBS team won’t be reusing old codes for new meanings, so that means that they’re going to start using longer codes. That’s forecasted to happen in May/June this year.

That’s where the fun starts. From my post from before, the proper representation for a PBS code in a CDA document is this:

<code code=”1471K” codeSystem=”″/>

and in a v2 message, it would be something like this:


(Though there’s no v2 registered code system for PBS – I’d like to think that means that no one is exchanging PBS codes in v2, but I think people are. We talked about defining PBS on an Australian basis, but we never formalised that)

The problem

However there’s a problem: when Medicare were updating their systems to handle longer PBS codes (something that was done last year after it was decided that PBS would use longer codes), they changed their interfaces so that the PBS code would be represented like this (quoted from one of several rules published as amendments to the National Health Act):

PBS/RPBS Item Code: Six bytes, right justified, zero filled, five bytes numeric followed by one byte alphabetic check character, being the code for the pharmaceutical benefit which appears in the Schedule of Pharmaceutical Benefits for Approved Pharmacists published by the Department of Health and Ageing. A zero code is to be used in the case of Repatriation items which are not included in the Schedule but have been prior approved by the Department of Veterans’ Affairs.

Valid values include: 0-9, A – Z. Alpha character must be in upper case.

So all the DHS/Medicare interfaces prefix the PBS code with a 0, like this: 01471K. Note that many of the medicare interfaces are fixed width file interfaces, and they don’t really have a lot of choice but to fix the width of the field, and it’s going to be either prefixed with spaces or zeroes.  Whatever,  the PBS Authority (DOHA) that publishes the codes doesn’t prefix with 0.

I discussed this with the team that defines the PBS codes. From their point of view, the code is an integer value, with an accompanying check character, and whether or not the integer value has any ‘0’ digits prefixed to it is not significant, doesn’t affect the check digit calculation, and doesn’t affect the interpretation of the code. Whether 0 is prefixed in any given context is an implementation detail, and they are comfortable with people using either form.

Note that they don’t use prefixed codes – the ascii and xml formats that they distribute don’t include a leading 0, and is not found (based on the links above). That’d be the implementation aspect of it showing up.

As a side note, the PBS codes already include 2 and 3 digit codes for some special items. For instance, 183P is Chlorhexidine Acetate   (use as additive only). Note that no variation of that I could think of trying worked, so I presume that these special codes are not published through the PBS web site. Btw, they PBS team also informed me that they reserve the right to use codes longer than 5 digits in the future, but I think we can safely assume that this won’t happen since DHS can’t support it without changing their interfaces again, which isn’t going to happen any time soon.

Because one of the first document types to be uploaded to the pcEHR was the medicare records, and because they use the 0-prefixed form internally, naturally, the medicare documents contained this form:

<code code=”01471K” codeSystem=”″/

And because of the way this is implemented internally in the pcEHR, the pcEHR is therefore configured to only accept the 0-prefixed form. And now vendors that are testing to their documents with the pcEHR are running into this as a problem, since they do not prefix the existing codes with 0s.


There are various possible solutions:

  1. Get DHS/Medicare to change to not prefix the codes with 0
  2. Insist that PBS codes are always prefixed with 0 values to 5 digits when they are exchanged between systems (but not displayed to users?)
  3. Leave it that all systems should accept codes prefixed with 0s or not and know that the code itself is the same either way
  4. Say that the medicare interfaces and the pcEHR documents always (and must) prefix with 0, but in other contexts, it doesn’t matter (see rule #3)

I think we can safely say that the first isn’t a good idea. The second sounds like a good idea unless one day we start using 6 digit codes, but what happens to existing interfaces? #3 sounds like a good idea, but is really tough for systems, since the codes are suffixed with an alpha character – this violates normal rules for handling code systems, and special cases are always bad. The 4th is a variation of the 3rd that spreads the pain around differently (still involves special cases).

It doesn’t seem obvious to me what is the right solution here. This blog post is to solicit opinions from the various vendors etc that are affected by this problem. You can either comment on the post here, or if that is counter to your policy, you can comment to me directly. (Note that if you have a relationship with NEHTA, and it’s necessary, you can comment to me under NDA if you wish).

At some stage I’ll have to update the registration for the code system to clarify this, but I’ll wait for now. Also, I think it’s probably necessary for us to get a v2 code system for PBS codes. Let me know if you exchange PBS codes in v2 messages.

Update (13-Feb 2013): DHS advises that: “PBS Online will accept either a zero-filled or non zero-filled PBS item code. In practice some Software Vendors zero-fill and some do not.”

FHIR Resources and Unicode

In the FHIR specification we say that the basic language for resources is unicode:

The XML character set is always Unicode.

Actually, that’s not the right wording – what it should have said is “The character set of a resource is always Unicode”.

Now if the character set is unicode, then any character encoding that is fully mapped to unicode is therefore valid. However, elsewhere in the specification, it says:

FHIR uses UTF-8 for all request and response bodies

This attracted several comments, all along the same lines – why require UTF-8? Well, the logic is fairly simple:

  • content type negotiation doesn’t work very well for character sets
  • while it might be legal to represent a resource in any character encoding mapped to unicode, what would you do if someone asked you to represent a resource in a character set that doesn’t have a mapping for one or more characters in the unicode?
  • Even though it’s possible to convert resources between character sets, what happens to digital signatures?
  • What’s going to happen if systems with different encodings, or with different supported subsets try to interoperate?
  • As for which unicode encodings, why support more than one? and UTF-8 is widely supported, and required by several HL7 Asian affiliates for v2
  • It’s just simpler to say, everyone use UTF-8.

One problem with requiring UTF-8 is that the HTTP default is ISO-8859-1. This means that you have to specify UTF-8 as the character set on all the http requests and responses. But since it’s a parameter of the content type, and you have to specify the content type anyway, I didn’t see that as particularly painful – but it did get comment in the connectathons, because you do have to remember.

Unicode subsets

However if you don’t support unicode natively – which is still a large subset of systems – then the fact that resources are always in UTF-8 presents you with a problem – you have to do something about the unicode issue, even if you are positive that all your trading partners are using pure ASCII. There’s still so many systems that don’t support unicode (the reason for this is because even though the platforms support unicode relatively well, to support it in your application, the entire eco-system – database, UIs, printers, messaging formats, etc all have support unicode, and for many vendors sorting this out simply isn’t feasible in a financial sense).

What I see in practice, is systems that can’t interoperate safely because they thought they were using pure ASCII, but they weren’t. (In fact, it’s not that unusual to see systems that don’t fully operate, let alone interoperate.) So I’d always prefer unicode as the wire format – it makes everyone deal with the issue.

So, we have several comments – why require UTF-8? Why not allow at least ISO-8859-1? Or why not allow any round-trip encoding? What if we require all interfaces to “support” UTF-8 in addition to anything else that they also do? Or maybe we require all servers to support UTF-8 at least?

We’ve discussed this in committee several times, and we’re just not sure what to do here. Seen as an entire eco-system – and I do think FHIR interfaces will be highly interconnected – a simple blanket rule of always UTF-8 is obviously much simpler overall. But it imposes an entry cost on many systems – especially the existing data stores, which are generally older systems – and maybe this isn’t a very good idea?

HHS HIT Standards Committee & Character Set

The situation is somewhat complicated by this (private communication that made it’s way to me):

The HHS HIT Standards Committee was asked how EHR language display should be certified using standards and the recommendation was ISO 8859-15 aka “Latin 9” which has character support for all the required ISO 639 languages including direct support for the Eastern European languages and transliteration to Latin characters for e.g. cyrillic and mandarin.  This EHR certification requirement is anticipated to raise issues for HL7 standards and HL7 implementers particularly for systems with interfaces to certified EHRS.  

I’ve got to say, I don’t really understand this. If you’re going to recommend something, why not Unicode? The point is, US EHR vendors (which includes all the multinationals) are going to be forced to change towards whatever this committee recommends. But now, instead of migrating to unicode, which is at least a sensible long term option, they’re going to spending their money changing from ISO-8859-1 – which is the default for all the systems I’ve ever looked at personally, to ISO-8859-15. I can only see that as a sideways move, and not a good investment on behalf of the end users. And how that will play in other countries, where ISO-8859-15 is not on the list of supported character sets in national standards?

In terms of FHIR and unicode, I’m not exactly sure what the impact of this is. ISO-8859-15 is fully mapped to unicode, so it probably doesn’t really change the basic question – unicode, or something else that makes subset support explicit? But EHR vendors are going to be important adopters of FHIR, so I think this weighs on the decision.


Identifiers in NEHTA Clinical Documents

There’s not a lot of good information around about handling identifiers in CDA documents. I’ve written about it before, but I’m not aware of anything else that’s publicly available. My previous post was “Identifiers in CDA Documents- Reporting Tool” and I really recommend that you read that one first. This post provides advice for how to use identifiers in NEHTA Clinical Document CDAs. In the course of time, this advice will develop into an official NEHTA FAQ, but I’m writing the guts of it here to get comment on it first.

In NETHA documents, Acts, Roles and Entities all have an id element, which usually has the following description:


In addition to the id element, some of the roles have an additional set of identifiers hanging off them:


In effect, the specification draws a difference between “technical identifiers” and “real world identifiers”. Real world identifiers – the ones that go in the “EntityIdentifier” class above – are ones that are mapped from the SCS (the logical definition of the document contents), or introduced from local usage such as MRN. Note the definition of the real world identifier: “it is known how it fits into the a wider mediated process concerning the entity”. That kind of leaves the “technical identifier” as the identifier where it isn’t known how it fits into the wider process. This post explains how that identifier element is meant to be used, and the limitations to be aware of when handling it.

The id element is a required element in some places of the CDA document, and not in others. Here’s an example:


Here, the author and the authenticators must have an id, the organisation might, and the Person who acts as Author can’t. Wherever the CDA implementation guides use a CDA class that has an id, the id is specifically called out in the mappings, whether or not it is mandatory (it is noted when it’s mandatory):


The contents of the id is a root, and maybe an extension. The root can either be an OID or a UUID (GUID, for windows programmers). The comment that is used is a little opaque:

This is a technical identifier that is used for system purposes such as matching. If a suitable internal key is not available, a UUID may be used.

btw, I wrote those words, so I have the right to call them opaque ;-).

The basic idea here is that the identifier (root + extension if present) is unique for this object, and that you could use this to match this object against another copy of the same object that you saw in a previous document (or elsewhere in the current document). That’s what “used for system purposes such as matching” means.

Before I talk about the limitations and caveats around how you can – or can’t – match using this identifier, what do you put in the id element?

If you have a (e.g. database) integer/string primary key for the concept, then you use that. In this case, you either assign an OID to this clients copy of this table, or create a GUID that identifies it, and use this in the root. In the extension, you just put the primary key directly:

<id root="5c9ef1f6-1292-448f-9568-9c5166314613” extension=”141234”/>

That way, any time you represent the same concept in any CDA document, it gets the same identifier, and a destination system can be sure, for instance, that if the code for the type of problem has been changed, that the original problem has been edited, rather than a new problem created.

This does mean that you have to be careful to choose a primary key that does reflect this kind of behaviour – not one that would be re-used for a different concept if it becomes free, for instance.

If the source system uses GUIDs as the primary key, then that would be represented like this:

<id root="047fa6e4-5e12-46e5-932e-d52c5677af35” />

Ideally, the generator of the CDA document always has access to the primary source tables from which it is generated. But we don’t live in such an ideal world: many of the CDA documents are generated from some HL7 v2 or XML source document by middleware. And very often, the source documents simply don’t include the source primary key. Wherever possible, it’s best to get the source primary key added to the intermediate source from which the CDA document is generated, but sometimes it’s just not possible. In this case, the middleware can only assign a random GUID for the id:

<id root="0c2a809e-fca7-4452-86e6-dde2f54766bc” />

Note that you can’t tell the difference between the last two cases. So you can’t tell, when you look at an id element in a CDA document, whether the id element is the real primary key or not.

Also note that there’s space for more than one id element – but you should only ever put one identifier there. Any other identifiers you have – MRNs, Provider identifiers, medication ids, etc – they go in the real world identifier.

Processing the id element – warnings and caveats

Ideally, the id element should be able to used for matching when content from the CDA documents is imported into an application.  For instance, if a patient presents at a GP for the first time, all the patients documents would be downloaded from the pcEHR, and the system would collate the records based on the id elements so that only the latest version of each problem, medication, pathology report etc would end up in the system.

But there’s a series of caveats that mean that this is may not be a good idea:

  • You can’t tell the difference between a real identifier, and a fake identifier made up to fill the space (unless the source system generating fake identifiers is not generating unique ones, you won’t get different records overwriting each other, but you will get identical records duplicating each other)
  • It’s very hard to determine the correct chronological order for the records. While the CDA document timestamp is clear and unequivocal, it’s not clear that the records in the document necessarily share the same timestamp. In most cases, that’s probably the case – it’s the obvious thing to do, but there’s no rule that it has to be the case, and there’s no conformance checking around this
  • The CDA documents are kind of snapshot anyway – if an existing record is deleted, it is simply omitted from later documents. So when collating records, id matching won’t catch deleted records. In effect, you’ll have to rely on the latest document
  • It’s not clear that you even want to match records like this in principle. In the PCEHR core, the underlying atomic data store does have the capacity to match records and overwrite earlier records with later copies. But when the consolidated views are generated out of that store, we don’t want the very latest information for an author or a problem etc to show as the content of earlier documents – we want what was shown in the original source document to appear in the view against that document. So the PCEHR data store never matches by the id element
  • Finally, a couple of RIM specific notes (ignore if you aren’t familiar with the RIM). RIM based processing engines might naturally associate a rule concerning immutable attributes that have consistent ids – that they can’t change, and it’s an error if an object with the same id but different immutable attributes is encountered. However, this is not the case – the quick explanation is that the value of the attribute (especially classCode/typeCode) may be constrained differently in different places. I’ll do a full explanation of this in a separate post if anyone wants
  • And then, there’s

The is a most slippery identifier. Note the definition from the RIM:

A unique identifier for the player Entity in this Role

That’s a little opaque: is that the identifier for the player when it plays this role, or the identifier for the player that is in this role? There’s a clarifying note:

The identifier of the Role identifies the Entity playing the role in that role

Only that doesn’t really clarify anything does it? I believe that the correct interpretation is the second: this is the identifier for the player – and that therefore the same role id may be encountered in multiple places for different roles where the same entity plays the different roles.

On these grounds, some of the CDA implementation guides say some variation of the following language:

When the author is also the ‘legalAuthenticator’ then ClinicalDocument/legalAuthenticator/assignedEntity/id SHALL be same as ‘ClinicalDocument/author/assignedAuthor/id

When the CDA implementation guides say this, then you can rely on this: the ids will match. That will be checked during the conformance process.

However, other than that, the ids probably aren’t going to be very useful in the near term. But you still have to fill them out where CDA requires them.