Category Archives: Data Types

Question: Multi-part surnames

Question:

How do we handle multiple family/last names? Andhow to re-construct the complete family name with multiple parts stored in db?
How about, for an example: Josep Puig i Cadafalch – Puig is the last name of his father, Cadafalch of his mother; i means “and” (see Iberian naming customs).

Answer:

Human Names are quite difficult in healthcare standards, because there’s fairly wide variety of cultural practices, and because they get involved ubiquitously across the process – and because most implementations work within a narrow cultural profile.

Let start with the simple case – how should this be represented in v2, CDA, and FHIR:

V2.7 XPN
Puig i Cadafalch^Josep
CDA PN
<name>
  <family>Puig</family>
  <family>i</family>
  <family>Cadafalch</family>
  <given>Josep</given>
</name>
FHIR HumanName
<name>
  <family value="Puig"/>
  <family value="i"/>
  <family value="Cadafalch"/>
  <given value="Josep"/>
</name>

So these are different. You can’t reproduce the CDA/FHIR structure in v2, but you can represent the v2 structure in CDA and FHIR:

CDA PN
<name>
  <family>Puig i Cadafalch</family>
  <given>Josep</given>
</name>
FHIR HumanName
<name>
  <family value="Puig i Cadafalch"/>
  <given value="Josep"/>
</name>

So that creates an inherent problem: there’s no one right way to represent this name in CDA (for FHIR, this is something you SHOULD NOT do – see below). The first form is preferred, but the second is still legal. Well, the thing is, we can’t make it illegal, because maybe the space really is part of a single surname (that would play out in a legal/jurisdictional system where pragmatic solutions aren’t always possible). So why do we even allow it to be broken up? That’s because not all parts of the surname have equal meaning, and the different meaning impacts on how you use names (searching and matching considerations). So you can break names up in CDA and FHIR to allow parts to be marked up. CDA provides qualifier for this, and in FHIR it would be extensions.

But there’s two different forms that are (in the case above), identical. How does this work in practice? Well, that brings the database question into focus. In practice, there’s a range of approaches. There are applications out there that handle names exactly as CDA/FHIR/ISO 21090 handle them- but in my experience, these were written from scratch to be conformant, not to solve particular real world problems. But mostly, there’s some combination of single or multiple given and surnames, with prefixes, sometimes suffixes, and occasionally surname prefixes (“van”). In english speaking countries, applications mostly only have a single surname.

So the ambiguity in the specification is faithfully reproducing variation in practice. We’d like to impose a single model of naming, but there isn’t one. In fact, the situation is much complicated by a person with a name from one culture trying to represent their name in another culture’s systems (tourists, or immigrants. Immigrants actually often take different name just to get by), which leads to some weird intersections of practice.

Implementation in FHIR

In FHIR, we’ve make this implementation note:

The parts of a name SHOULD NOT contain whitespace. For family name, hyphenated names such as “Smith-Jones” are a single name, but names with spaces such as “Smith Jones” are broken into multiple parts

and this one:

Systems that do not support as many name parts as are provided in an instance they are processing may wish to append parts together using spaces, so that this becomes “van Hentenryck”.

This makes things workable – it doesn’t matter how you store it, there’s guidance for interoperating with other applications, and in most circumstances, this will work fine.

But its’ SHOULD not SHALL  because we know of edge cases like the jurisdictional one above where you mightn’t be able to follow the simple rules.

 

Documents are only immutable as a matter of policy

One of the issues that kept coming up when we were discussing documents in FHIR last week is that notion that documents are immutable, and can’t change. Take, for instance, this, from the comments:

Document should be immutable

Only, documents aren’t immutable. That’ll be a shock for CDA centric readers, where the immutability of documents is a fundamental notion that is hammered into implementers by the specification and the community around it, but it’s true.

First, you have to build the document. Clearly, it’s not immutable while you’re doing that – it needs to be edited, changed, assembled, and this can be a piece meal process. So the technical artifacts that underly building a document aren’t immutable. Even after you have it, you can break it down, change things, reassemble them. It’s still the same technical constructs – so even then, it’s not immutable.

The immutability is affixed as a matter of policy – once a document is “done”, then it’s treated as immutable by choice, to establish clinical traceability. This is important, and necessary.

In CDA, the existence of a CDA document is evidence that the document is done, and so they are treated as unchangeable. If you, privately, choose to keep partially formed CDA documents that you don’t treat as immutable, well, that’s your private business, and there’s no need to let anyone else see your private business. So even with CDA, it turns out, immutability is a matter of policy.

FHIR is different to CDA, because the FHIR spec defines a whole lot of functionality that’s relevant and important during the building and processing phases. So it would be a mistake to restrict that functionality at the technical level to ensure that documents are immutable – because they actually aren’t at that level. What FHIR needs to do is ensure that the control points to enforce and audit immutability exist so that the policy imperative of clinical traceability can be delivered. That’s something that we’re keeping in mind as we work on the document design.

This same dichotomy exists, btw, in the v3 data types, where the abstract specification describes the data types as “immutable”, because they are, in concept. But the ISO 21090 concrete version implicitly caters for non-immutable definitions, and there’s discussion in the spec around the difference between immutable in technical terms, and immutable in policy terms.

 

 

More fun with original Text

As I’ve described before (and here) originalText is a challenge.

Firstly, originalText is a consistent pattern throughout the HL7 data types (v2, v3, and FHIR) is that a coded value is represented by some variation of a cluster that contains

  • (code / system / display) 0..*
  • original text

There’s variations to this theme amongst the CE/CNE,/CWE, CD/CE/CV, and Coding/CodeableConcept, but they all have the same basic pattern. The multiple codes are for equivalent codes that say the same thing in different coding systems. Here’s a v3 example (from the spec):

<code code='195967001' codeSystem='2.16.840.1.113883.19.6.96' 
    codeSystemName='SNOMED CT' displayName='Asthma'>
  <originalText>Mild Asthma</originalText>
  <translation code='49390' codeSystem='2.16.840.1.113883.19.6.2' 
   codeSystemName='ICD9CM' displayName='ASTHMA W/O STATUS ASTHMATICUS'/>
</code>

This has the original text “Mild Asthma”, and two different codes, one from ICD-9-CM, and one from Snomed-CT. That’s a pretty straight forward idea.

The problem

But what if the original text is something a little more complex?

Diabetes & Hypertension

The problem with this original text is that there’s going to be two codes. If we stick to just Snomed-CT, this is codes 73211009: Diabetes mellitus, and 38341003: Hypertensive disorder, systemic arterial (I think). There’s no appropriate code that covers both. And this:

<code code='73211009' codeSystem='2.16.840.1.113883.19.6.96' 
    codeSystemName='SNOMED CT' displayName='Diabetes mellitus'>
  <originalText>Diabetes & Hypertension</originalText>
 <translation code='38341003' codeSystem='2.16.840.1.113883.19.6.96' 
   codeSystemName='SNOMED CT' displayName='Hypertensive disorder'/>
</code>

is not legal, because there’s no way that these two codes fit anywhere near the definition of

The translations are quasi-synonyms of one real-world concept. Every translation in the set is supposed to express the same meaning “in other words.”

And this becomes not merely a thereortical problem if you’re trying to provide translations to yet another code system as well. So this is a problem – you have to pick one of the codes.

I suppose you could tell the user that “Diabetes & Hypertension” is not a valid text to insert. I’m sure they’ll be fine with that.

Not

2nd try

An alternative is to say that we ensure that in the circumstance this can arise – say, the reason for prescribing a medication – allows 0..* coded data types, not just 0..1. Then we can do this:

<code code='73211009' codeSystem='2.16.840.1.113883.19.6.96' 
    codeSystemName='SNOMED CT' displayName='Diabetes mellitus'/>
<code code='38341003' codeSystem='2.16.840.1.113883.19.6.96' 
   codeSystemName='SNOMED CT' displayName='Hypertensive disorder'/>

Only, where did the original text go? Well, you could repeat it, I suppose. That’d be technically correct:

<code code='73211009' codeSystem='2.16.840.1.113883.19.6.96' 
    codeSystemName='SNOMED CT' displayName='Diabetes mellitus'>
  <originalText>Diabetes & Hypertension</originalText>
</code>
<code code='38341003' codeSystem='2.16.840.1.113883.19.6.96' 
 codeSystemName='SNOMED CT' displayName='Hypertensive disorder'>
  <originalText>Diabetes & Hypertension</originalText>
</code>

So now, a receiving application has to rip through the codes and eliminate duplicate original texts – that’s not my favourite outcome.

FHIR

We have a third option in FHIR: break apart the data type, and declare that the containing element (the reason for prescription in this case) isn’t a CodeableConcept, but a complex element that has a have a text element, and 0..* coding elements that don’t have to be equivalent:

<reasonForPrescription>
 <coding>
   <code value='73211009'/>
   <system value=http://snomed.info/id"/>
   <display value='Diabetes mellitus'/>
 </coding>
 <coding>
   <code value='38341003'/>
   <system value=http://snomed.info/id"/>
   <display value='Hypertensive disorder'/>
 </coding>
 <text value="Diabetes & Hypertension"/>
</reasonForPrescription>

Actually, this has exactly the same look on the wire, but it’s defined differently. So this is ok, but there’s no facility for providing translations to other code systems (and the RIM mapping has now become impossible because we haven’t solved this for the RIM)

Conclusion

Original Text continues to be a problem because it cross-cuts the coding structures. If only we could force end-users to value coding enough that we could get rid of text ;-)

 


			

On the subject of original text for Codes

In the various coded data types defined by HL7 across v2, v3, and FHIR, there’s a property named text or originalText that is defined using some variant of these words:

The text as seen and/or selected by the user who entered the data which represents the intended meaning of the user.

Original text can be used in a structured user interface to capture what the user saw as a representation of the code or expression on the data input screen, or in a situation where the user dictates or directly enters text, it is the text entered or uttered by the user.

Unfortunately, what this exactly means is a matter of interpretation. The key question is, to what degree does the context affect the interpretation of the text that represents the code, and therefore, to what degree does the context contribute to the original text?

I’ll illustrate the discussion with an example. In SNOMED CT, there’s a (large) heirarchy for organism type. Part of the hierarchy contains codes for virii. A subset of this is found in the PHINVADS value set “Virus types answer list specific to Arbovirus/ArboNet reporting“. This lists 17 codes for type of virus. So you could easily imagine some kind of UI, for instance, where users would select one of the codes from a pick list:

In this case, the original text is the same as the Snomed-CT preferred name, and it’s pretty straight-forward to understand. If, for instance, the user picked “Eastern equine encephalitis virus”, then that’s the original text, and nothing further is needed.

However, a lot of system designers will look at this and say, the word “virus” is repeated in every entry, and that’s just a tax on the users. We should get rid of it. That would give you an entry like this:

Actually, in this case, the example is pretty trivial. “Virus” isn’t hard to read. But how about this SNOMED CT preferred term: “Cholecystectomy with exploration of common bile duct and choledochoenterostomy” – there’s quite a lot of potential for useful simplification there, especially where the set of codes are all siblings, such as the variations of strength in a particular medication:

This somewhat extreme example is from AMT. I doubt any reader can even figure out the differences between those 4 codes. How much easier this is:

Hopefully that example will serve to illustrate that this isn’t just a UI best practices issue – as the codes become finer, it starts to become a clinical safety issue too.

Back to the virus case: if the user picks “Eastern equine encephalitis”, then is the original text “Virus: Eastern equine encephalitis” or just “Eastern equine encephalitis”? What actually works best depends on quite how the original text is going to be used. If the original text is used as the faithful reproduction of the meaning of the user in a similar context as the user entered, then the minimal text the user actually picked is the useful original text – but how similar? If, on the other hand, the original text is used out of context, the full context of the data entry of the code should be represented – but this could be a combination of the text the user actually picked, the field name, additional words taken out of the explicit context on the screen, and even some text that is implicit in the clinical context.

To make things even more fun, a contributor on the HL7 vocabulary mailing list offered this example:

aafbgbbi

 

I’m not sure what the best way to resolve this. How do you make original text reliably useful for both uses when the user interface isn’t nailed down?

Well, one way is to rely on the value set – the value set description should contain the information that is implicit in the context. So the true original text would be the value set description + the user picked text. Though I don’t think that any particular field in the value set (either v3 or FHIR) is defined for this purpose in mind. Perhaps that’s something we should address?

 

Question: What does an empty name mean?

I was recently asked whether this fragment is legal CDA:

 <patient>
   <name/>
 </patient>

The answer is that it’s legal, but I have no idea what it should be understood to mean.

It’s legal

Firstly, it’s legal. There’s a rule in the abstract data types specification:

invariant(ST x) where x.nonNull { 
   x.headCharacter.notEmpty;  
}

In other words: if a string is not null, it must have at least one character content in it. In a CDA context, then, this would be illegal:

 <material>
   <lotNumberText/>
 </material>

Aside: the schema doesn’t disallow this, and few schematrons check this. But it’s still illegal. It’s a common error to enounter where a CDA document is generated using some object model or an xslt – it’s not so common where the CDA document is generated by some xml scripting language a la PHP.

However the definition of PN (the type of <name/>) is not a simple string: it’s a list of part names, each of which is a string. PN = LIST<ENXP>. So while this is illegal:

 <patient>
   <name>
    <given/>
   </name>
 </patient>

- because the given name can’t be empty. This isn’t:

 <patient>
   <name/>
 </patient>

That’s because a list is allowed to be empty.

Aside: I don’t know whether this is legal:

 <patient>
   <name>
    <given> </given>
   </name>
 </patient>

 That’s because we run into XML whitespace processing issues. Nasty, murky XML. Next time, we’ll use JSON.

What does it mean?

So what does this mean?

 <patient>
   <name/>
 </patient>

Well, it’s a name with no parts that is not null. Literally: this patient has a name, and the name has no parts.

Note the proper use of nullFlavors:

  • The patient’s name is unknown: <name nullFlavor=”UNK”/>
  • We didn’t bother collecting the patient name: <name nullFlavor=”NI”/>
  • We’re telling you that we’re not telling you the patient name: <name nullFlavor=”MSK”/>(unusual – usually you’d just go for NI when suppressing the name)
  • The patient’s name can’t be meaningful in this context: <name nullFlavor=”NA”/> – though I find operational uses of NA very difficult to understand
  • The patient doesn’t have a name: <name nullFlavor=”ASKU”/> – because if you know that there’s no name, you’ve asked, and it’s unknown. Note that it would be a very unusual circumstance where a patient doesn’t have a working name (newborns or unidentified unconscious patients get assigned names), but it might make much more sense for other things like drugs

So this is none of these. It’s a positive statement that the patient has a name, and the name is… well, empty? I can’t think of how you could have an empty name (as opposed to having no name). I think that we should probably have a rule that names can’t have no parts either.

It’s not clear that this is a statement that the name is empty though. Consider this fragment:

 <patient>
   <name> 
    <family>Grieve</family>
    <given>Grahame</given>
   </name>
 </patient>

If you saw this in a CDA document, would you think that you should understand that I have no middle name (I do, though I have friends that don’t). We could be explicit about that:

 <patient>
   <name> 
    <family>Grieve</family>
    <given>Grahame</given>
    <given nullFlavor="ASKU"/>
   </name>
 </patient>

Though I  think that ASKU is getting to be wrong here – you could say that the middle name is unknown, but it would be better to say that the middle name count is 0 – it’s just that we didn’t design the name structure that way because of how names work in other cultures than the English derived one. (which, btw, means that some of this post may be inapplicable outside the english context).

The first case would be normal, so this means, we don’t say anything about middle names. So why would not including any given name or family name mean anything more than “we don’t say anything about the name”? And, in fact, that’s the most likely cause of the form – nothing is known, but the developer put the <name> element in irrespective of the fact that the parts weren’t known.

So it’s quite unclear what it should mean, it most likely arises as a logic error, and I recommend that implementations ensure that an empty name never appears on the wire.

Guest Post: HL7 Language Codes

This guest post is written by Lloyd McKenzie.  I’ve been meaning to get to it since the January WGM, but I’ve been wrapped up in other things (most recently HIMSS).  However, I agree with what Lloyd says.

Question: I have a need to communicate a wide variety of language codes in HL7 v3 instances, but the ISO Data Type 21090 specification declares that ED.language (and thus ST.language) are constrained to IETF 3066.  This is missing many of the languages found in ISO 639-3 – which I need.  Also, IETF 3066 is deprecated.  It’s been replaced twice.  Can I just use ISO 639-3 instead?

Answer:

The language in the 21090 specification was poorly chosen.  It explicitly says “Valid codes are taken from the IETF RFC 3066”.  What it should have said is “Valid codes are taken from IETF language tags, the most recent version of which at the time of this publication is IETF RFC 3066”.  (Actually, by the time ISO 21090 actually got approved, the most recent RFC was 4646, but we’ll ignore that for now.)  This should be handled as a technical correction, though that’s not terribly easy to do.  However, implementers are certainly welcome to point to this blog as an authoritative source of guidance on ISO 21090 implementation and make use of any language codes supported in subsequent versions of the IETF Language Tags code system – including RFC 4646 and RFC 5646 as well as any subsequent version there-of.

The RFC 5646 version incorporates all of the languages found in ISO 639-3 and 639-5.  However, be aware that while all languages are covered, there are constraints on the codes that can be used for a given language.  Specifically, if a language is represented in ISO 639-1 (2-character codes), that form must be used.  The 3-character variants found in ISO 639-2 cannot be used.  For example, you must send “en” for English, not “eng”.

Question: But I want to send the 3-character codes.  That’s what my system stores.  Can’t I use ISO 639-2 directly?

Answer:

No.  In the ISO 21090 specification, the “language” property is defined as a CS.  That means the data type is fixed to a single code system.  The code system used is IETF Language Tags, which is consistent with what the w3c uses for language in all of their specifications and encompasses all languages in all of the ISO 639 specifications plus many others (for example, country-specific dialects as well as additional language tags maintained by IANA.)

Question: Well, ok, but what about in the RIM for LanguageCommunication.code.  Can I send ISO 639-2 codes there?

Answer:

Yes, though with a caveat.  LanguageCommunication.code is defined as a CD, meaning you can send multiple codes – one primary code and as many translations as you see fit.  You are free to send ISO 639-2 codes (the 3-character ones) or any other codes as a translation.  However, LanguageCommunication.code has a vocabulary assertion of the HumanLanguage concept domain, which is universally bound to a value set defined as “all codes from the ietf3066 code system”.  That means the primary code within the CD must be an IETF code.  So that gives you two options:

  1. Fill the root code with the appropriate IETF code – copying the ISO code most of the time and translating the 3-character code to the correct 2-character code for those 84 language codes found in ISO 639-1; or
  2. Omit the root code property and set the null flavor to “UNC” (unencoded), essentially declaring that you haven’t bothered to try translating the code you captured into the required code sytem.

And before you mention it, yes, the reference to IETF 3066 is a problem.  The actual code system name in the HL7 specification is “Tags for the Identification of Languages”, which is the correct name.  However the short name assigned was “ietf3066” and the description in the OID registry refers explicitly to the 3066 version.  This is an error, as IETF 3066 is a version of the IETF “Tags for the Identification of Language” code system and the OID is for the code system, not a particular version of it.  (There have actually been 4 versions so far – 1766, 3066, 4646 and 5646.)  We’ll try to get the short name and description corrected via the HL7 harmonization process

Question: But I don’t want to translate to 2-character codes and I don’t want to use a null flavor.  Can’t we just relax the universal binding?

Answer:

We can’t relax the binding because the HumanLanguage concept domain is shared by both the ED.language property in the abstract data types specification (which ISO 21090 is based on) and the LanguageCommunication.code attribute.  The ED.language is a CS and so must remain universally bound.

In theory, we could split into two separate domains – one for data types and one for LanguageCommunication.code.  The second one could have a looser binding.  However, it’s hard to make a case for doing that.  There are several issues:

First, having two different bindings for essentially the same sort of information is just going to cause grief for implementers.  You could be faced with declaring what language the patient reads in one code system, but identifying the language of the documentation the patient’s supposed to read in a second code system.

Second, the IETF code system fully encompasses all languages covered by all the ISO 639-x code systems, plus thousands of others expressible using various sub-tags such as identifying country-specific dialects.  In the unlikely situation that you need a language that can’t be expressed using any of those, there’s even a syntax for sending local codes (and a mechanism for registering supplemental codes with IANA if you want to be more official).  So there should never be a situation where you can’t express your desired language using the IETF Language Tags code system.

Question: I don’t really care that I can express my languages in IETF.  I’ve already pre-adopted using ISO 639-2 and -3 in my v3 implementation and I don’t want to change.  Why are you putting constraints in place that prevent implementers from doing what they want to do?

Answer:

Well, technically your implementation right now is non-conformant.  And implementers always have the right to be non-conformant.  HL7 doesn’t require anyone to follow any of its specifications.  So long as your communication partners are willing to do what you want to do, anything goes by site-specific agreement.

However, the standards process is about giving up a degree of implementation flexibility in exchange for greater interoperability.  By standardizing on a single set of codes for human language, we’re able to ensure interoperability across all systems.  Natively, those systems may use other code systems, but for communication purposes, they translate to the common code system so everyone can safely exchange information.

If the premise for loosening a standard was “we won’t require any system to translate data from their native syntax”, there’d be no standards at all.  Yes, translation and mapping requires extra effort (though a look-up table for 84 codes with direct 1..1 correspondence is pretty easy compared to a lot of the mapping effort needed in other areas.)  But that’s the price of interoperability.

Question: GTS cardinality in CDA

Question:

I have an RMIM that states “effectiveTime” is a GTS[0..1], that implies a SET<TS>. (CDA R2, SubstanceAdministration.effectiveTime). Furthermore, I have a (Schematron) constraint that effectiveTime is [1..1].

Thee following snippet is ok by schema, but the schematron constraint fails:

 <effectiveTime xsi:type="IVL_TS">
  <low value="20110301"/>
  <high value="20120301"/>
 </effectiveTime>
 <effectiveTime institutionSpecified="true" operator="A" xsi:type="PIVL_TS">
  <period unit="h" value="6"/>
 </effectiveTime>

What does the snippet mean, and is it legal?

Answer:
The snippet identifies the intersection of “every 6 hours” and “1-Mar 2011 to 1-Mar 2012″. It’s meaningful, and it’s valid CDA. Whether it’s valid against the schematron.. that’s a much harder question. As a background, the cardinality stated against an item in the RIM is the UML cardinality. There’s a little bit of ambiguity here, with collections – does a collection<T> with cardinality 1.2 mean one or items in a collection, or 1 or 2 collections of items? UML is unclear, but most interpretations consider it to be the former, as does the HL7 v3 methodology that underlies the RIM.

So, the cardinality in the CDA document is the UML cardinality. When the CDA document is represented in XML, using the XML representations defined in the base v3 methodology, there is one xml element for each data type, and therefore there is a direct correspondance between the UML cardinality and the XML cardinality: UML cardinality 1..2 means 1..2 XML elements… for all cases. Except, that is, for GTS.

GTS is a set of items. A single set – a set with a cardinality of 1 – can be built by specifying an element, and then adding repeated elements, each of which define an operation that is to be performed on the set using the operator attribute. No matter how many XML elements there are, there is only one set. So the UML cardinality is 1, irrespective of the XML element cardinality.

Aside – I need to make several caveats to this statement:

  • All the repeating elements must have an operator. If there’s a repeating element that doesn’t have an operator, then this is an invalid instance (Act.effectiveTime has an upper cardinality of 1 in the RIM itself)
  • Strictly, there are ways to build a bag<T> using BXIT<T> and operators where the UML cardinality doesn’t match the XML element cardinality – but there’s no use for this in the context of CDA (and no rational use for this anywhere)
  • some UML attributes are represented as XML attributes, and cardinality applies differently

So, given this, the question is, what does it mean to say “I have a (Schematron) constraint that effectiveTime is [1..1]” – does it intend to address the UML or the XML cardinality? That’s not clear. Consider this, from the CCDA ballot:

The cardinality indicator (0..1, 1..1, 1..*, etc.) specifies the allowable occurrences within a document instance.

Figure 2: Constraints format – only one allowed

1. SHALL contain exactly one [1..1] participant (CONF:2777).

a. This participant SHALL contain exactly one [1..1] @typeCode=”LOC”
(CodeSystem: 2.16.840.1.113883.5.90 HL7ParticipationType)
(CONF:2230).

The language in this extract is unclear – do they mean, the XML representation of the document instance, or a logical document at the UML level? Not clear. “shall contain one participant”… I’m inclined to think that this is different to “shall contain one participant element”, and that the cardinalities should be understood to line up with the UML cardinalities. Of course, there’s only one case where this matters, which is this one.

So I reckon that fragment at the top should be valid. But there’s a way to work around this – this fragment is equivalent in meaning:

 <effectiveTime xsi:type="SXPR_TS">
  <comp xsi:type="IVL_TS">
   <low value="20110301"/>
   <high value="20120301"/>
  </comp>
  <comp institutionSpecified="true" operator="A" xsi:type="PIVL_TS">
   <period unit="h" value="6"/>
  </comp>
 </effectiveTime>

But it has a cardinality of 1 in the XML. So I recommend to use this form, and not the other. Bingo, problem solved.

Identifiers in CDA Documents- Reporting Tool

This post is prompted by the intersection of two issues:

  • Conclusions from quality checking CDA documents in the Australian National EHR Program
  • A series of questions I took privately about how identifiers work in Consolidated CDA

This post explains how identifiers are supposed to work in a CDA document, and introduces a reporting tool to help implementers assess the quality of identifier usage in a CDA document.

@ID attribute

The first kind of identifier in a CDA document is the “ID” attribute that can appear in the following places:

  • Section
  • ObservationMedia
  • RegionOfInterest

These are added so they can be a target of a linkHtml or renderMultiMedia – i.e. the narrative element <linkHtml ref=”#a1″> points to the section <section ID=”a1″>. Note that the attribute is defined as an xml:id, and values of xml:id must be unique within the XML document that contains them – using this makes it difficult to combine single CDA documents into groups in a single XML content (i.e. atom feed, for instance) – and impossible if they are signed.

The ID attribute also exists on the most of the narrative elements, so that they can be the target of an originalText reference in a CD data type, to indicate that the source of this code is this particular text. This is a very advanced usage. It would also be possible, using this method, to make any narrative element the target of a linkHtml reference, but to my knowledge the CDA specification doesn’t say if this is not legal (I think it’s intended that it’s not).

The ID attribute is also used for references from footnoteRefs to footnotes.

These are the only allowed uses of xml:id attributes in a CDA document. It can’t be used to indicate that this [thing] here is the same as that [thing] over there (i.e. this section and that section share the same author). To do that, you have to use logical object references using the id element

id element

Many elements in the CDA document have a child “id” element that serves to identify the class that contains them. Technically, this is the RIM classes Entity, Role, and Act, which generally are allowed to carry one or more identifiers in the id element

Note that this means that some CDA elements have both a child element “id” and an attribute “ID”. Some tools struggle with this. I would’ve thought that such tools were long fixed – it’s not that uncommon to have duplicate names between attributes and elements, since it’s not wrong, but I’ve found a few dev tools that don’t cope with this in the last 12 months. The only solution is to get back to the maintainer of the tools, and screech loudly at them till they fix their tool.

The id element has two important attributes: root and extension. The root has to be either an OID or a UUID, and an extension – any string not including whitespaces – may be present. The identifier (either the root alone if no extension, or the root+extension) must be globally unique.

I’ve found that this root with optional extension business is a lot harder to grasp than it sounds, partly because OIDs have an internal root/extension structure, and so it’s really unclear whether your leaf concept should be in the root or the extension. Say, for example, you have a medical record number, a six digit number, and you assign an OID for it, 2.16.840.1.113883.19.1. Should you represent your MRNs as

<id root="2.16.840.1.113883.19.1.45235"/>

or as

<id root="2.16.840.1.113883.19.1" extension="45235"/>

Generally, I prefer the second (it allows leading zeros, alpha characters if they become required, and is easier to pull out just the MRN), but both forms are valid, and the decision rests with the person who first registers the OID. And if you look in the OID registry, registered OIDs rarely explain which form is correct.

Another confusing thing is whether an extension is allowed or required if the root is a UUID. It’s allowed – and whether it’s required depends on where the unique part comes from. If I’m going to use a stream of unique numbers to actually make the value unique, and I’m just using the UUID to provide a globally unique space for them, then there’ll be an extension:

<id root=655f67b1-2b11-4038-b82f-f6ab2f566f87" extension="1234"/>

As a rule of thumb, if the UUID is registered in the HL7 OID registry (as 655F67B1-2B11-4038-B82F-F6AB2F566F87 is) then you need an extension for the actual unique part. (Note that the UUID is supposed to be represented in lowercase even though the schema doesn’t say so – and irrespective of what case is registered in the registry).

For any Australian readers, if you aren’t sure about this: consult the new Australian handbook on representing identifiers, see my earlier blog post, or ask me.

Unique Identifiers

The fact that identifiers are required to be unique means two things:

  • The identifier uses a properly allocated OID, or a generated UUID, so that no one else would accidentally use it. This sounds hard, but it’s actually relatively easy; generate a GUID (Ctrl-Alt-G in most IDEs), or just register an OID at the HL7 OID registry, but register it carefully, at a fine enough scope that this what you want to use
  • You have to use in a disciplined fashion, so that you only use it for one thing.

The second part turns out to be harder than it sounds. The problem is that there’s no tool to alert you when you copy paste an identifier from one part of the document to another (or from one part of your code to another). I see too many documents that contain duplicate identifiers – that is, the same identifier is used on different elements that represent different objects.

One of my correspondents asked why we don’t simply make a rule that you can’t have duplicated identifiers in a CDA document, like we have with the ID attribute. This would prevent accidental or lazy use of the same identifier again – but it’s not possible, because there’s valid cases for using the same identifier more than once

Identifiers are not unique in a document

This occurs when the same concept can appear multiple times in the document. For example:

  • When the same template is used multiple times
  • When the same person is both author and legalAuthenticator
  • When the same organisation employs all the personal and scopes the patient for the document

So these are all common cases. Other than the template id, a natural question that arises is about the relationship between two instances of the same object in the same document. Take, for example, this fragment of an author from an Australian CDA example:

<author>
 <assignedAuthor>
  <id root="7FCB0EC4-0CD0-11E0-9DFC-8F50DFD72085" />
  <id root="1.2.36.1.2001.1003.0.8003611234567890" />
  <addr use="WP">
   <streetAddressLine>1 Clinician Street</streetAddressLine>
   <city>Nehtaville</city>
   <state>QLD</state>
   <postalCode>5555</postalCode>
  </addr>
  <telecom use="WP" value="tel:0712341234" />
  <assignedPerson>
   <name>
    <prefix>Dr.</prefix>
    <given>Good</given>
    <family>Doctor</family>
   </name>
  </assignedPerson>
 </assignedAuthor>
</author>

Note to alert Australian readers: yes, I moved HPI-I from it’s normal place, since this is for international readers.

This author has two identifiers, what we might call a technical identifier (the UUID) and the real-world identifier, which is the number by which the author is registered with the national authority. That’s an arbitrary distinction that’s not made in the document itself – the only way to know this is to consult the definitions of the identifiers

For most documents, the author is also the legal authenticator, so we’re going to repeat all the same information there too:

<legalAuthenticator>
 <assignedEntity>
  <id root="7FCB0EC4-0CD0-11E0-9DFC-8F50DFD72085" />
  <id root="1.2.36.1.2001.1003.0.8003611234567890" />
  <addr use="WP">
   <streetAddressLine>1 Clinician Street</streetAddressLine>
   <city>Nehtaville</city>
   <state>QLD</state>
   <postalCode>5555</postalCode>
  </addr>
  <telecom use="WP" value="tel:0712341234" />
  <assignedPerson>
   <name>
    <prefix>Dr.</prefix>
    <given>Good</given>
    <family>Doctor</family>
   </name>
  </assignedPerson>
 </assignedEntity>
</legalAuthenticator>

Note that the element is different, but everything else is the same. However, you could argue that this is redundant – we already provided all the information about the person the first time, and the second time, all we need to do is provide an identifier:

<legalAuthenticator>
 <assignedEntity>
  <id root="7FCB0EC4-0CD0-11E0-9DFC-8F50DFD72085" />
 </assignedEntity>
</legalAuthenticator>

 

On reaching the second case, you go and resolve the first identifier, know that this is referring to the same actual object as the first case, and fill in all the details accordingly. However this is complicated by the fact that in some cases where you can do this, the kind of information you can represent is different in each case (author and custodian, for instance), so you mightn’t be able to provide all the details in the first instance. So what do you do if the second case contains different details from the first? Is that an accident, or the correct way to represent it? Unfortunately, the only way to know is to examine the details on each instance, and reason from the underlying RIM classes – there’s no easy rule of thumb.

One notion that this section suggests is that you can extract these RIM entities, roles and classes out to a persistent data store, and use the identifiers to trace the objects across various documents as you see them. This should be safe, after all, because the identifiers are unique. Only, not so much.

Re-using identifiers between documents

Firstly, there’s no guarantee that a given object will have the same identifier across different CDA documents from the same source. Commonly, CDA documents are generated from some intermediary XML or v2 object that doesn’t have the underlying identifiers in it, even if they exist in the original source. In these cases, the objects may acquire a transient identifier that is used multiple times within each document, but is not maintained across the documents. It’s very difficult to consistently identify an object across documents in this case.

Another problem is that some identifiers actually identify the business process that the object represents, and may end up being attached to multiple different objects that all relate to the same real-world process. Lab Order Ids are a classic case here – they’ll be associated with the object that identifies their acknowledgement response to a request for tests, and to the results that represent the outcomes of the request. Driver’s licenses are another example – they’re used to identify multiple different objects that represent the same person (usually from different institutions).

The upshot of this is that even when done well by the author, you can’t simply rely on the identifiers behaving in any particular way.

Reporting Tool

But very often, identifiers aren’t done well. And there’s no conformance tooling that can automatically figure out whether identifiers are being done properly in a document. So I’ve created a little reporting service that takes a CDA document, scans all the identifiers in it, and produces a report that helps visualise the identifiers, and see whether they are being used properly. We’ll be using it in the Australian national program to help check that a document has good identifiers in it. Feel free to use it in other contexts, and I’d welcome suggestions for how to make it more useful (and crash reports for how to break it).

Follow this link to http://hl7connect.healthintersections.com.au/svc/ids, paste your CDA document into the link, click the button, and then read the report… all the steps up to the last one are real easy. Good luck and happy CDA writing/reading…

Link from a CDA narrative to another CDA document in an XDS repository

So, a use case has come up in the pcEHR for a CDA document that links to another CDA document in an XDS repository (Consolidated View, actually, for those who want to know). So how to do that?

Well, firstly, this use case corresponds relatively directly to the one described here:

For a variety of reasons, it is desirable to refer to the document by its identity, rather than by linking through a URL.

  1. The identity of a document does not change, but the URLs used to access it may vary depending upon location, implementation, or other factors.
  2. Referencing clinical documents by identity does not impose any implementation specific constraints on the mechanism used to resolve these references, allowing the content to be implementation neutral. For example, in the context of an XDS Affinity domain the clinical system used to access documents would be an XDS Registry and one or more XDS Repositories where documents are stored. In other contexts, access might be through a Clincial Data Repository (CDR), or Document Content Management System (DCMS). Each of these may have different mechanisms to resolve a document identifier to the document resource.
  3. The identity of a document is known before the document is published (e.g., in an XDS Repository, Clincial Data Repository, or Document Content Management System), but its URL is often not known. Using the document identity allows references to existing documents to be created before those documents have been published to a URL. This is important to document creators, as it does not impose workflow restrictions on how links are created during the authoring process.

h/t to Keith Boone for that link, btw. So that’s pretty much our use case: to link to a document that’s found in the pcEHR. And you can’t do that by URL, because there’s no URL that directly addresses an XDS getDocument call, and anyway, some background machinery to do with trust is required. So how to do that?

Firstly, in the structured data representation of the CDA, we need to assert that some entry is a reference to another document. For that, we use an external document reference:

This says that this entry is a reference (typeCode=”REFR”) to the external Document. And while we’re at it, we can use the code to indicate what type of document it is, and the templateId to indicate which particular implementation guide it conforms to. And the id of the document – that’s the id by which to request the document from the XDS repository (the pcEHR in this case).

<entry>
 <!-- snip -->
 <reference typeCode='REFR'>
  <externalDocument classCode='DOC' moodCode='EVN'> 
   <templateId extension="1.3" root="1.2.36.1.2001.1001.101.100.1002.120" />  
   <id root="d22de837-9d35-438d-933d-74f08d7657f5"/>
   <code code="60591-5" codeSystem="2.16.840.1.113883.6.1" displayName="Patient Summary" />
  </externalDocument>
 </reference>

So that’s the logical reference to the Shared Health Summary with id “d22de837-9d35-438d-933d-74f08d7657f5″. A clinical system encountering this knows that it needs to retrieve that document from it’s cache or the pcEHR or some other local repository as configured. But what about the narrative?

Linking to an external document from Narrative

The Consolidated view is delivered as a CDA document because.. what other standard way to do is there? Everything else is CDA anyway…. so we might as well use it, which means we need to have a narrative, and it needs to refer to the document. This, it turns out, is not so easy. We need to use <linkHtml> to refer to the other document:

 <text>
  <paragraph>
    <linkHtml href="..."/>Other document</linkHtml>
  </paragraph>
 </text>

The problem is what to put in the href attribute. The obvious thing to put in is an id reference (e..g. #a3) to the external document, which refers to the external document directly:

 <reference typeCode='REFR'>
  <externalDocument ID="a3" classCode='DOC' moodCode='EVN'>

But you can’t have an ID on the externalDocument, and anyway, the rules for an internal reference on linkHtml are:

The target of an internal reference is an identifier of type XML ID, which can exist on other elements in the same or a different narrative block, or XML ID attributes that have been added to the <section>, <ObservationMedia>, or <renderMultiMedia> elements

So that’s a dead end. The IHE page I referred to above which describes the problem nicely suggests a way to link the external reference to the linkHtml:

<text><paragraph><linkHtml ID="a3" href="..."/></paragraph></text>
<!-- snip -->
<entry>
 <!-- snip -->
 <reference typeCode='REFR'>
  <externalDocument classCode='DOC' moodCode='EVN'> 
   <templateId extension="1.3" root="1.2.36.1.2001.1001.101.100.1002.120" />  
   <id root="d22de837-9d35-438d-933d-74f08d7657f5"/>
   <code code="60591-5" codeSystem="2.16.840.1.113883.6.1" displayName="Patient Summary" />
   <text><reference value="#a3"></reference>
  </externalDocument>
 </reference>

Only, I don’t think that’s right. The contents of the external document are not the contents of the linkHtml element (which is what that precisely means), and it’s not even really right to say that the contents of the document are the target of the linkHtml, and anyway, we still haven’t resolved what that actually is – pointing something else to the linkHtml element doesn’t resolve what it points to in the href attribute. So while I think that IHE described the problem very nicely, I don’t think they’ve solved it.

This brings us back to the URL portion of linkHtml. It just happens to defined as an xs:string, with the rule: “It can be used to reference identifiers that are either internal or external to the document”. I guess this means that the right way to fill it for external references is with a logical identifier uri. And 3 come to mind:

<linkHtml href="oid:0.1.2.3.4.5.6.7.8.9..."/>
<linkHtml href="uuid:d22de837-9d35-438d-933d-74f08d7657f5"/>
<linkHtml href="hl7-att:root[:extension]"/>

The first two (and their logical variants urn:uuid:… and urn:oid:… – which of these to use is complicated but largely, in the end, irrelevant, because everyone’s going to have to cut code to support whatever goes in here anyway) are obvious places to look, since they are defined externally, but they both suffer the problem that what if the document identifier includes an extension? (and it’s allowed to). For this reason, the V3 data types R2 defined the protocol hl7-att as:

the form hl7-att:[II.literal], such as hl7-att:2.1.16.3.9.12345.2.39.3:ABC123. The scheme hl7-att is used to make references to HL7 Attachments. HL7 attachments may be located in the instance itself as an attachment on the Message class, or in some wrapping entity such as a MIME package, or stored elsewhere. ..[snip].. Attachments SHALL be globally uniquely identified. Attachment id is mandatory, and an ID SHALL never be re-used. Once assigned, an attachment id SHALL be accosiated with exactly one byte-stream as defined for ED.data.

The language there is that of the v3 data types, but the concepts are applicable in this case, and the ground rules are all applicable. Hence, this is the right way to do a reference from one CDA document to another by a logical identifier. To recap, in the structured data:

<entry>
 <!-- snip -->
 <reference typeCode='REFR'>
  <externalDocument classCode='DOC' moodCode='EVN'> 
   <templateId extension="1.3" root="1.2.36.1.2001.1001.101.100.1002.120" />  
   <id root="d22de837-9d35-438d-933d-74f08d7657f5"/>
   <code code="60591-5" codeSystem="2.16.840.1.113883.6.1" displayName="Patient Summary" />
  </externalDocument>
 </reference>

and in the narrative:

<text><paragraph><linkHtml href="hl7-att:d22de837-9d35-438d-933d-74f08d7657f5"/></paragraph></text>

Thanks to Keith Boone for assistance with this post.

p.s. Is it valid to use a URL scheme defined in data types R2 in CDA R2, which use data types R1?  Yes, it is, because it’s valid to use scheme’s defined anywhere else as well.

Technical Correction in Data Types R2 / ISO 21090

According to the data types, the PostalAddressUse Enumeration is based on the code system identified by the OID 2.16.840.1.113883.5.1012. But if you look up that OID in the OID registry (or the MIF, for insiders), you see that:

Retired as of the Novebmer 2008 Harmonization meeting, and replaced by 2.16.840.1.113883.5.1119 AddressUse.

This appears to be my error – I should’ve used the OID 2.16.840.1.113883.5.1119. We’ll have to see about issuing a technical correction.