Category Archives: Data Types

Observation of Titers in HL7 Content

Several important diagnostic measures take the form of a Titer. Quoting from Wikipedia:

A titer (or titre) is a way of expressing concentration. Titer testing employs serial dilution to obtain approximate quantitative information from an analytical procedure that inherently only evaluates as positive or negative. The titer corresponds to the highest dilution factor that still yields a positive reading. For example, positive readings in the first 8 serial twofold dilutions translate into a titer of 1:256 (i.e., 2−8). Titers are sometimes expressed by the denominator only, for example 1:256 is written 256.

A specific example is a viral titer, which is the lowest concentration of virus that still infects cells. To determine the titer, several dilutions are prepared, such as 10−1, 10−2, 10−3, … 10−8.

So the higher the titer, the higher the concentration. 1:2 means a lower concentration than 1:128 (note that this means the clinical intent is the opposite of the literal numeric intent – as the titre gets lower, the concentration gets higher).

Titers are pretty common in clinical diagnostics – I found about 2600 codes for titer type tests in LOINC v2.48 (e.g. Leptospira sp Ab.IgG).

Representing Titers in HL7 Content

In diagnostic reports, titers are usually presented in the text narrative (or the printed form) using the form 1:64, since this makes clear the somewhat arbitrary nature of the numbers in the value. However it’s not unusual for labs to report just the denominator (e.g. “64”) and the person interpreting the reports is required to understand that this is a titer test (this is usually stated in the name).

When it comes to reporting a Titer in structured computable content, there’s several general options:

  • represent it as a string, and leave it up to the recipient to parse that if they really want
  • represent it as an integer, the denominator
  • use a structured form for representing the content

Each of the main HL7 versions (v2, CDA, and FHIR) offer options for each of these approaches:

String Integer Structured
V2 OBX||ST|{test}||1:64 OBX||NM|{test}||64 OBX||SN|{test}||^1^:^64
CDA <value xsi:type=”ST”> 1:64 </value> <value xsi:type=”INT” value=”1:64″/> <value xsi:type=”RTO_INT_INT”> <numerator value=”1”/> <denominator value=”64”> </value>
FHIR “valueString “ : “1:64” “valueInteger” : ”64” “valueRatio”: { “numerator” : { “value” : “1” }, “denominator” : { “value” : “64” } }

(using the JSON form for FHIR here)

One of the joys of titres is that there’s no consistency between the labs – some use one form, some another. A few even switch between representations for the same test (e.g. one LOINC code, different forms, for the same lab).

This is one area where there would definitely be some benefit – to saying that all labs should use the same form. That’s easy to say, but it would be really hard to get the labs to agree, and I don’t know what the path to pushing for conformance would be (in the US, it might be CLIA; in Australia, it would be PITUS; for other countries, I don’t know).

One of the problems here is that v2 (in particular) is ambiguous about whether OBX-5 is for presentation or not. It depends on the use case. And labs are much more conservative about changing human presentation than changing computable form – because of good safety considerations. (Here in Australia, the OBX05 should not be used for presentation, if both sender and receiver are fully conformant to AS 4700.2, but I don’t think anyone would have any confidence in that). In FHIR and CDA, the primary presentation is the narrative form, but the structured data would become the source of presentation for any derived presentation; this is not the primary attested presentation, which generally allays the lab’s safety concerns around changing the content.

If that’s not enough, there’s a further issue…

Incomplete Titers

Recently I came across a set of lab data that included the titer “<1:64”. Note that because the intent of the titre is reversed, it’s not perfectly clear what this means. Does this mean that titre was <64? or that the dilution was greater than 64. Well, fortunately, it’s the first. Quoting from the source:

There are several tests (titers for Rickettsia rickettsii, Bartonella, certain strains of Chlamydia in previously infected individuals, and other tests) for which a result that is less than 1:64 is considered Negative.  For these tests the testing begins at the 1:64 level and go up, 1:128, 1:256, etc.   If the 1:64 is negative then the titer is reported as less than this.

The test comes with this sample interpretation note:

Rickettsia rickettsii (Rocky Mtn. Spotted Fever) Ab, IgG:

  • Less than 1:64: Negative – No significant level of Rickettsia rickettsii IgG Antibody detected.
  • 1:64 – 1:128: Low Positive – Presence of Rickettsia rickettsii IgG Antibody detected
  • 1:256 or greater: Positive – Presence of Rickettsia rickettsii IgG Antibody, suggestive of recent or current infection.

So, how would you represent this one in the various HL7 specifications?

String Integer Structured
V2 OBX||ST|{test}||<1:64 {can’t be done} OBX||SN|{test}||<^1^:^64
CDA <value xsi:type=”ST”> &lt;1:64 </value>  {can’t be done} <value xsi:type=”IVL_RTO_INT_INT”> <high> <numerator value=”1”/> <denominator value=”64”> </high> </value>
FHIR “valueString “ : “<1:64” {can’t be done} “valueRatio”: { “numerator” : { “comparator” : “<”, “value” : “1” }, “denominator” : { “value” : “64” } }

This table shows how the stuctured/ratio form is better than the simple numeric – but there’s a problem: the CDA example, though legal in general v3, is illegal in CDA because CDA documents are required to be valid against the CDA schema, and IVL_RTO_INT_INT was not generated into the CDA schema. I guess that means that the CDA form will have to be the string form?



Exchanging Codes with Translations in practice

All the mainstream clinical coding data types across HL7 (v2, v3, FHIR) and elsewhere (openEHR, IHE, DICOM to a lesser degree) allow for a translations of codes. These are provided because institutions are often not able to get consistency over which terminologies are in use. On the other hand, actually exchanging translations of codes is not that common in my experience. I’ve seen it in the following cases:

  • Coding for lab tests and panels (typically, a combination of local, LOINC, and Snomed codes)
  • Translations for medication codes in the Australian PCEHR (from one system only)

That’s the cases where I’ve seen it in practice. Where else have you actually seen it used? Can we gather some specific cases where it *has* been used (not where it might be a good idea, since complexity always sounds like a good idea at first).

If you have, make a comment on this post, please

Explaining the v3 II Type again

Several conversations in varying places make me think that it’s worth explaining the v3 II type again, since there’s some important subtleties about how it works that are often missed (though by now the essence of it is well understood by many implementers). The II data type is “An identifier that uniquely identifies a thing or object“, and it has 2 important properties:

root A unique identifier that guarantees the global uniqueness of the instance identifier. The root alone may be the entire instance identifier.
extension A character string as a unique identifier within the scope of the identifier root.

The combination of root and extension must be globally unique – in other words, if any system ever sees the same identifer elsewhere, it can know beyond any doubt that this it is the same identifier. In practice, this fairly simple claim causes a few different issues:

  • How can you generate a globally unique identifier?
  • The difference between root only vs. root+extension trips people up
  • How do you know what the identifier is?
  • What if different roots are used for the same thing?
  • The difference between the identifier and the identified thing is an end-less source of confusion
  • How to handle incompletely known identifiers

How can you generate a globally unique identifier? 

At first glance, the rule that you have to generate a globally unique identifier that will never be used elsewhere is a bit daunting. How do you do that? However, in practice, it’s actually not that hard. The HL7 standard says that a root must be either a UUID, or an OID.


Wikipedia has this to say about UUIDs:

Anyone can create a UUID and use it to identify something with reasonable confidence that the same identifier will never be unintentionally created by anyone to identify something else. Information labeled with UUIDs can therefore be later combined into a single database without needing to resolve identifier (ID) conflicts.

UUIDs were originally used in the Apollo Network Computing System and later in the Open Software Foundation‘s (OSF) Distributed Computing Environment (DCE), and then in Microsoft Windows platforms as globally unique identifiers (GUIDs).

Most operating systems include a system facility to generate a UUID, or, alternatively, there are robust open source libraries available. Note that you can generate this with (more than) reasonable confidence that the same UUID will never be generated again – see Wikipedia or Google.

Note that UUIDs should be represented as uppercase (53C75E4C-B168-4E00-A238-8721D3579EA2), but lots of examples use lowercase (“53c75e4c-b168-4e00-a238-8721d3579ea2”). This has caused confusion in v3, because we didn’t say anything about this, so you should always use case-insensitive comparison. (Note: FHIR uses lowercase only, per the IETF OID URI registration).

For people for whom a 10^50 chance of duplication is still too much, there’s an alternative method: use an OID


Again, from Wikipedia:

In computing, an object identifier or OID is an identifier used to name an object (compare URN). Structurally, an OID consists of a node in a hierarchically-assigned namespace, formally defined using the ITU-T‘s ASN.1 standard, X.690. Successive numbers of the nodes, starting at the root of the tree, identify each node in the tree. Designers set up new nodes by registering them under the node’s registration authority.

The easiest way it illustrate this is to show it works for my own OID,

1 ISO OID root
2 ISO issue each member (country) a number in this space
36 Issued to Australia. See – you can use your Australian Company Number (ACN) in this OID space
146595217 Health Intersections ACN

All Australian companies automatically have their own OID, then. But if you aren’t an Australian company, you can ask HL7 or HL7 Australia to issue you an OID – HL7 charges $100 for this, but if you’re Australian (or you have reasons to use an OID in Australia), HL7 Australia will do it for free. Or you can get an OID from anyone else who’ll issue OIDs. In Canada, for instance, Canada Health Infoway issues them.

Within that space, I (as the company owner) can use it how I want. Let’s say that I decide that I’ll use .1 for my clients, I’ll give each customer a unique number, and then I’ll use .0 in that space for their patient MRN, then an MRN from my client will be in the OID space “″

So as long as each node within the tree of nodes is administered properly (each number only assigned/used once), then there’ll never be any problems with uniqueness. (Actually, I think that the chances of the OID space being properly administered are a lot lower than 1-1/10^50, so UUIDs are probably better, in fact)

Root vs Root + extension

In the II data type, you must have a root attribute (except in a special case, see the last point below). The extension is optional. The combination of the 2 must be unique. So, for example, if you the root is a GUID, and each similar object simply gets a new UUID, then there’s no need for a extension:

<id root="53C75E4C-B168-4E00-A238-8721D3579EA2"/>

If, on the other hand, you’re going to use “53C75E4C-B168-4E00-A238-8721D3579EA2” to identify the set of objects, then you need to assign each object an extension:

<id root="53C75E4C-B168-4E00-A238-8721D3579EA2" extension="42"/>

You’ll need some other system for uniquely assigning the extension to an object. Customarily, a database primary key is used.

With OIDs, the same applies. If you assign a new OID to each object (as is usually done in DICOM systems for images), then all you need is a root:

<id root="1.2.840.113663.1100.16233472.1.911832595.119981123.1153052"/> <!-- picked at random from a sample DICOM image -->

If, on the other hand, you use the same OID for the set of objects, then you use an extension:

<id root="1.2.840.113663.1100.16233472.1.911832595.119981123" extension="1153052"/>

And you assign a new value for the extension for each object, again, probably using a database primary key.

Alert readers will now be wondering, what’s the difference between those two? And the answer is, well, those 2 identifiers are *not* the same, even though they’re probably intended to be. When you use an OID, you have to decide: are sub-objects of the id represented as new nodes on the OID, or as values in the extension. That decision can only be made once for the OID. So it’s really useful how only about 0.1% of OIDs actually say in their registration (see below for discussing registration). So why even allow both forms? Why not simply say, you can’t use an extension with an OID? The answer is because not every extension is a simple number than can be in an OID. It’s common to see composite identifiers like “E34234” or “14P232344” or even simply MRNs with 0 prefixed, such as 004214 – none of these are valid OID node values.

Note for Australian readers: the Health Identifier OID requires that IHIs be represented as an OID node, not an extension, like this: Not everyone likes this (including me), but that’s how it is.

The notion that root + extension should be globally unique sounds simple, but I’ve found that it’s real tricky in practice for many implementers.

Identifier Type

When you look at an identifier, you have no idea what type of identifier it is:

<id root="53C75E4C-B168-4E00-A238-8721D3579EA2" extension="42"/>

What is this? You don’t know either what kind of object it identifies, or what kind of identifier it is. You’ll have to determine that from something else than the identifier – the II is *only* the identifier, not anything else. However it might be possible to look it up in a registry. For instance, the IHI OID is registered in the HL7 OID registry. You can look it up here. That can tell you what it is, but note two important caveats:

  • There’s no consistent definition of the type – it’s just narrative
  • There’s no machine API to the OID registry

For this reason, that’s a human mediated process, not something done on the fly in the interface. I’d like to change that, but the infrastructure for this isn’t in place.

So for a machine, there’s no way to determine what type of identifier this is from just reading the identifier – it has to be inferred  from context – either the content around the identifier, or reading an implementation guide.

Different Roots for the same thing?

So what stops two different people assigning different roots to the same object? In this case, you’ll get two different identifiers:

<id root="53C75E4C-B168-4E00-A238-8721D3579EA2" extension="42"/>
<id root="" extension="42"/>

Even though these are same thing. What stops this happening?

Well, the answer is, not much. The idea of registering identifiers is actually to try and discourage this – if the first person registers the identifier, and the second person looks for it before registering it themselves, then they should find it, and use the same value. but if they’re lazy, or the description and/or the search is bad, then there’ll be duplicates. And, in fact, there are many duplicates in the OID registry, and fixing them (or even just preventing them) is beyond HL7.

Difference between identifier and identified thing

In theory, any time you see the same object, it should have the same identity, and any time you see the same identifier, it should be the same object. Unfortunately, however, life is far from that simple:

  • The original system might not store the identifier, and it’s just assigned on the fly by middleware. Each time you see the same object, it’ll get a new identity. Personally, I think this is terrible, but it’s widely done.
  • The same identifier (e.g. lab report id) might be used for multiple different representations of the same report object
  • The same identifier (e.g. Patient MRN) might be used for multiple different objects that all represent the same real world thing
  • The same identifier (e.g. lab order number) might be used for multiple different real world entities that all connect to the same business process

R2 of the datatypes introduced a “scope” property so you could identify which of these the identifier is, but this can’t be used with CDA.

Incomplete identifiers

What if you know the value of the identifier, but you don’t know what identifier it actually is? This is not uncommon – here’s two scenarios:

  • A patient registration system that knows the patient’s driver license number, but not which country/state issued it
  • A point of care testing device that knows what the patient bar-code is, but doesn’t know what institution it’s in

In both these cases, the root is not known by the information provider. So in this case, you mark the identifer as unknown, but provide the extension:

<id nullFlavor="UNK" extension="234234234"/>

Note: there’s been some controversy about this over the years, but this is in fact true, even if you read something else somewhere else.

Error in R2 Datatypes

Today, Rick Geimer and Austin Kreisler discovered a rather egregious error in the Data Types R2 specifications. The ISO data types say:

This specification defines the following extensions to the URN scheme:

hl7ii – a reference to an II value defined in this specification. The full syntax of the URN is urn:hl7ii:{root}[:{extension}] where {root} and {extension} (if present) are the values from the II that is being referenced. Full details of this protocol are defined in the HL7 Abstract Data Types Specification

Reference: Section of the ISO Data types

However, if you look through the Abstract Data Types R2 for information about this, you won’t find anything about hl7ii. But there is this, in section

The scheme hl7-att is used to make references to HL7 Attachments. HL7 attachments may be located in the instance itself as an attachment on the Message class, or in some wrapping entity such as a MIME package, or stored elsewhere.

The following rules are required to make the hl7-att scheme work:

  1. Attachments SHALL be globally uniquely identified. Attachment id is mandatory, and an ID SHALL never be re-used. Once assigned, an attachment id SHALL be accosiated with exactly one byte-stream as defined for
  2. When receiving an attachment, a receiver SHOULD store that attachment for later reference. A sender is not required to resend the same attachment if the attachment has already been sent.
  3. Attachment references SHALL be resolved against all stored attachments using the globally unique attachment identifier in the address.

(p.s. references are behind HL7 registration wall, but are free)

These are meant to be the same thing, but they are named differently: urn:hl7ii:… vs hl7-att:…

I guess this will have to be handled as a technical correction to one of the two specifications. I prefer urn:hl7ii:.

Comments welcome.

Question: Multi-part surnames


How do we handle multiple family/last names? Andhow to re-construct the complete family name with multiple parts stored in db?
How about, for an example: Josep Puig i Cadafalch – Puig is the last name of his father, Cadafalch of his mother; i means “and” (see Iberian naming customs).


Human Names are quite difficult in healthcare standards, because there’s fairly wide variety of cultural practices, and because they get involved ubiquitously across the process – and because most implementations work within a narrow cultural profile.

Let start with the simple case – how should this be represented in v2, CDA, and FHIR:

V2.7 XPN
Puig i Cadafalch^Josep
FHIR HumanName
  <family value="Puig"/>
  <family value="i"/>
  <family value="Cadafalch"/>
  <given value="Josep"/>

So these are different. You can’t reproduce the CDA/FHIR structure in v2, but you can represent the v2 structure in CDA and FHIR:

  <family>Puig i Cadafalch</family>
FHIR HumanName
  <family value="Puig i Cadafalch"/>
  <given value="Josep"/>

So that creates an inherent problem: there’s no one right way to represent this name in CDA (for FHIR, this is something you SHOULD NOT do – see below). The first form is preferred, but the second is still legal. Well, the thing is, we can’t make it illegal, because maybe the space really is part of a single surname (that would play out in a legal/jurisdictional system where pragmatic solutions aren’t always possible). So why do we even allow it to be broken up? That’s because not all parts of the surname have equal meaning, and the different meaning impacts on how you use names (searching and matching considerations). So you can break names up in CDA and FHIR to allow parts to be marked up. CDA provides qualifier for this, and in FHIR it would be extensions.

But there’s two different forms that are (in the case above), identical. How does this work in practice? Well, that brings the database question into focus. In practice, there’s a range of approaches. There are applications out there that handle names exactly as CDA/FHIR/ISO 21090 handle them- but in my experience, these were written from scratch to be conformant, not to solve particular real world problems. But mostly, there’s some combination of single or multiple given and surnames, with prefixes, sometimes suffixes, and occasionally surname prefixes (“van”). In english speaking countries, applications mostly only have a single surname.

So the ambiguity in the specification is faithfully reproducing variation in practice. We’d like to impose a single model of naming, but there isn’t one. In fact, the situation is much complicated by a person with a name from one culture trying to represent their name in another culture’s systems (tourists, or immigrants. Immigrants actually often take different name just to get by), which leads to some weird intersections of practice.

Implementation in FHIR

In FHIR, we’ve make this implementation note:

The parts of a name SHOULD NOT contain whitespace. For family name, hyphenated names such as “Smith-Jones” are a single name, but names with spaces such as “Smith Jones” are broken into multiple parts

and this one:

Systems that do not support as many name parts as are provided in an instance they are processing may wish to append parts together using spaces, so that this becomes “van Hentenryck”.

This makes things workable – it doesn’t matter how you store it, there’s guidance for interoperating with other applications, and in most circumstances, this will work fine.

But its’ SHOULD not SHALL  because we know of edge cases like the jurisdictional one above where you mightn’t be able to follow the simple rules.


Documents are only immutable as a matter of policy

One of the issues that kept coming up when we were discussing documents in FHIR last week is that notion that documents are immutable, and can’t change. Take, for instance, this, from the comments:

Document should be immutable

Only, documents aren’t immutable. That’ll be a shock for CDA centric readers, where the immutability of documents is a fundamental notion that is hammered into implementers by the specification and the community around it, but it’s true.

First, you have to build the document. Clearly, it’s not immutable while you’re doing that – it needs to be edited, changed, assembled, and this can be a piece meal process. So the technical artifacts that underly building a document aren’t immutable. Even after you have it, you can break it down, change things, reassemble them. It’s still the same technical constructs – so even then, it’s not immutable.

The immutability is affixed as a matter of policy – once a document is “done”, then it’s treated as immutable by choice, to establish clinical traceability. This is important, and necessary.

In CDA, the existence of a CDA document is evidence that the document is done, and so they are treated as unchangeable. If you, privately, choose to keep partially formed CDA documents that you don’t treat as immutable, well, that’s your private business, and there’s no need to let anyone else see your private business. So even with CDA, it turns out, immutability is a matter of policy.

FHIR is different to CDA, because the FHIR spec defines a whole lot of functionality that’s relevant and important during the building and processing phases. So it would be a mistake to restrict that functionality at the technical level to ensure that documents are immutable – because they actually aren’t at that level. What FHIR needs to do is ensure that the control points to enforce and audit immutability exist so that the policy imperative of clinical traceability can be delivered. That’s something that we’re keeping in mind as we work on the document design.

This same dichotomy exists, btw, in the v3 data types, where the abstract specification describes the data types as “immutable”, because they are, in concept. But the ISO 21090 concrete version implicitly caters for non-immutable definitions, and there’s discussion in the spec around the difference between immutable in technical terms, and immutable in policy terms.



More fun with original Text

As I’ve described before (and here) originalText is a challenge.

Firstly, originalText is a consistent pattern throughout the HL7 data types (v2, v3, and FHIR) is that a coded value is represented by some variation of a cluster that contains

  • (code / system / display) 0..*
  • original text

There’s variations to this theme amongst the CE/CNE,/CWE, CD/CE/CV, and Coding/CodeableConcept, but they all have the same basic pattern. The multiple codes are for equivalent codes that say the same thing in different coding systems. Here’s a v3 example (from the spec):

<code code='195967001' codeSystem='2.16.840.1.113883.19.6.96' 
    codeSystemName='SNOMED CT' displayName='Asthma'>
  <originalText>Mild Asthma</originalText>
  <translation code='49390' codeSystem='2.16.840.1.113883.19.6.2' 
   codeSystemName='ICD9CM' displayName='ASTHMA W/O STATUS ASTHMATICUS'/>

This has the original text “Mild Asthma”, and two different codes, one from ICD-9-CM, and one from Snomed-CT. That’s a pretty straight forward idea.

The problem

But what if the original text is something a little more complex?

Diabetes & Hypertension

The problem with this original text is that there’s going to be two codes. If we stick to just Snomed-CT, this is codes 73211009: Diabetes mellitus, and 38341003: Hypertensive disorder, systemic arterial (I think). There’s no appropriate code that covers both. And this:

<code code='73211009' codeSystem='2.16.840.1.113883.19.6.96' 
    codeSystemName='SNOMED CT' displayName='Diabetes mellitus'>
  <originalText>Diabetes & Hypertension</originalText>
 <translation code='38341003' codeSystem='2.16.840.1.113883.19.6.96' 
   codeSystemName='SNOMED CT' displayName='Hypertensive disorder'/>

is not legal, because there’s no way that these two codes fit anywhere near the definition of

The translations are quasi-synonyms of one real-world concept. Every translation in the set is supposed to express the same meaning “in other words.”

And this becomes not merely a thereortical problem if you’re trying to provide translations to yet another code system as well. So this is a problem – you have to pick one of the codes.

I suppose you could tell the user that “Diabetes & Hypertension” is not a valid text to insert. I’m sure they’ll be fine with that.


2nd try

An alternative is to say that we ensure that in the circumstance this can arise – say, the reason for prescribing a medication – allows 0..* coded data types, not just 0..1. Then we can do this:

<code code='73211009' codeSystem='2.16.840.1.113883.19.6.96' 
    codeSystemName='SNOMED CT' displayName='Diabetes mellitus'/>
<code code='38341003' codeSystem='2.16.840.1.113883.19.6.96' 
   codeSystemName='SNOMED CT' displayName='Hypertensive disorder'/>

Only, where did the original text go? Well, you could repeat it, I suppose. That’d be technically correct:

<code code='73211009' codeSystem='2.16.840.1.113883.19.6.96' 
    codeSystemName='SNOMED CT' displayName='Diabetes mellitus'>
  <originalText>Diabetes & Hypertension</originalText>
<code code='38341003' codeSystem='2.16.840.1.113883.19.6.96' 
 codeSystemName='SNOMED CT' displayName='Hypertensive disorder'>
  <originalText>Diabetes & Hypertension</originalText>

So now, a receiving application has to rip through the codes and eliminate duplicate original texts – that’s not my favourite outcome.


We have a third option in FHIR: break apart the data type, and declare that the containing element (the reason for prescription in this case) isn’t a CodeableConcept, but a complex element that has a have a text element, and 0..* coding elements that don’t have to be equivalent:

   <code value='73211009'/>
   <system value="/>
   <display value='Diabetes mellitus'/>
   <code value='38341003'/>
   <system value="/>
   <display value='Hypertensive disorder'/>
 <text value="Diabetes & Hypertension"/>

Actually, this has exactly the same look on the wire, but it’s defined differently. So this is ok, but there’s no facility for providing translations to other code systems (and the RIM mapping has now become impossible because we haven’t solved this for the RIM)


Original Text continues to be a problem because it cross-cuts the coding structures. If only we could force end-users to value coding enough that we could get rid of text 😉



On the subject of original text for Codes

In the various coded data types defined by HL7 across v2, v3, and FHIR, there’s a property named text or originalText that is defined using some variant of these words:

The text as seen and/or selected by the user who entered the data which represents the intended meaning of the user.

Original text can be used in a structured user interface to capture what the user saw as a representation of the code or expression on the data input screen, or in a situation where the user dictates or directly enters text, it is the text entered or uttered by the user.

Unfortunately, what this exactly means is a matter of interpretation. The key question is, to what degree does the context affect the interpretation of the text that represents the code, and therefore, to what degree does the context contribute to the original text?

I’ll illustrate the discussion with an example. In SNOMED CT, there’s a (large) heirarchy for organism type. Part of the hierarchy contains codes for virii. A subset of this is found in the PHINVADS value set “Virus types answer list specific to Arbovirus/ArboNet reporting“. This lists 17 codes for type of virus. So you could easily imagine some kind of UI, for instance, where users would select one of the codes from a pick list:

In this case, the original text is the same as the Snomed-CT preferred name, and it’s pretty straight-forward to understand. If, for instance, the user picked “Eastern equine encephalitis virus”, then that’s the original text, and nothing further is needed.

However, a lot of system designers will look at this and say, the word “virus” is repeated in every entry, and that’s just a tax on the users. We should get rid of it. That would give you an entry like this:

Actually, in this case, the example is pretty trivial. “Virus” isn’t hard to read. But how about this SNOMED CT preferred term: “Cholecystectomy with exploration of common bile duct and choledochoenterostomy” – there’s quite a lot of potential for useful simplification there, especially where the set of codes are all siblings, such as the variations of strength in a particular medication:

This somewhat extreme example is from AMT. I doubt any reader can even figure out the differences between those 4 codes. How much easier this is:

Hopefully that example will serve to illustrate that this isn’t just a UI best practices issue – as the codes become finer, it starts to become a clinical safety issue too.

Back to the virus case: if the user picks “Eastern equine encephalitis”, then is the original text “Virus: Eastern equine encephalitis” or just “Eastern equine encephalitis”? What actually works best depends on quite how the original text is going to be used. If the original text is used as the faithful reproduction of the meaning of the user in a similar context as the user entered, then the minimal text the user actually picked is the useful original text – but how similar? If, on the other hand, the original text is used out of context, the full context of the data entry of the code should be represented – but this could be a combination of the text the user actually picked, the field name, additional words taken out of the explicit context on the screen, and even some text that is implicit in the clinical context.

To make things even more fun, a contributor on the HL7 vocabulary mailing list offered this example:



I’m not sure what the best way to resolve this. How do you make original text reliably useful for both uses when the user interface isn’t nailed down?

Well, one way is to rely on the value set – the value set description should contain the information that is implicit in the context. So the true original text would be the value set description + the user picked text. Though I don’t think that any particular field in the value set (either v3 or FHIR) is defined for this purpose in mind. Perhaps that’s something we should address?


Question: What does an empty name mean?

I was recently asked whether this fragment is legal CDA:


The answer is that it’s legal, but I have no idea what it should be understood to mean.

It’s legal

Firstly, it’s legal. There’s a rule in the abstract data types specification:

invariant(ST x) where x.nonNull { 

In other words: if a string is not null, it must have at least one character content in it. In a CDA context, then, this would be illegal:


Aside: the schema doesn’t disallow this, and few schematrons check this. But it’s still illegal. It’s a common error to enounter where a CDA document is generated using some object model or an xslt – it’s not so common where the CDA document is generated by some xml scripting language a la PHP.

However the definition of PN (the type of <name/>) is not a simple string: it’s a list of part names, each of which is a string. PN = LIST<ENXP>. So while this is illegal:


– because the given name can’t be empty. This isn’t:


That’s because a list is allowed to be empty.

Aside: I don’t know whether this is legal:

    <given> </given>

 That’s because we run into XML whitespace processing issues. Nasty, murky XML. Next time, we’ll use JSON.

What does it mean?

So what does this mean?


Well, it’s a name with no parts that is not null. Literally: this patient has a name, and the name has no parts.

Note the proper use of nullFlavors:

  • The patient’s name is unknown: <name nullFlavor=”UNK”/>
  • We didn’t bother collecting the patient name: <name nullFlavor=”NI”/>
  • We’re telling you that we’re not telling you the patient name: <name nullFlavor=”MSK”/>(unusual – usually you’d just go for NI when suppressing the name)
  • The patient’s name can’t be meaningful in this context: <name nullFlavor=”NA”/> – though I find operational uses of NA very difficult to understand
  • The patient doesn’t have a name: <name nullFlavor=”ASKU”/> – because if you know that there’s no name, you’ve asked, and it’s unknown. Note that it would be a very unusual circumstance where a patient doesn’t have a working name (newborns or unidentified unconscious patients get assigned names), but it might make much more sense for other things like drugs

So this is none of these. It’s a positive statement that the patient has a name, and the name is… well, empty? I can’t think of how you could have an empty name (as opposed to having no name). I think that we should probably have a rule that names can’t have no parts either.

It’s not clear that this is a statement that the name is empty though. Consider this fragment:


If you saw this in a CDA document, would you think that you should understand that I have no middle name (I do, though I have friends that don’t). We could be explicit about that:

    <given nullFlavor="ASKU"/>

Though I  think that ASKU is getting to be wrong here – you could say that the middle name is unknown, but it would be better to say that the middle name count is 0 – it’s just that we didn’t design the name structure that way because of how names work in other cultures than the English derived one. (which, btw, means that some of this post may be inapplicable outside the english context).

The first case would be normal, so this means, we don’t say anything about middle names. So why would not including any given name or family name mean anything more than “we don’t say anything about the name”? And, in fact, that’s the most likely cause of the form – nothing is known, but the developer put the <name> element in irrespective of the fact that the parts weren’t known.

So it’s quite unclear what it should mean, it most likely arises as a logic error, and I recommend that implementations ensure that an empty name never appears on the wire.

Guest Post: HL7 Language Codes

This guest post is written by Lloyd McKenzie.  I’ve been meaning to get to it since the January WGM, but I’ve been wrapped up in other things (most recently HIMSS).  However, I agree with what Lloyd says.

Question: I have a need to communicate a wide variety of language codes in HL7 v3 instances, but the ISO Data Type 21090 specification declares that ED.language (and thus ST.language) are constrained to IETF 3066.  This is missing many of the languages found in ISO 639-3 – which I need.  Also, IETF 3066 is deprecated.  It’s been replaced twice.  Can I just use ISO 639-3 instead?


The language in the 21090 specification was poorly chosen.  It explicitly says “Valid codes are taken from the IETF RFC 3066”.  What it should have said is “Valid codes are taken from IETF language tags, the most recent version of which at the time of this publication is IETF RFC 3066”.  (Actually, by the time ISO 21090 actually got approved, the most recent RFC was 4646, but we’ll ignore that for now.)  This should be handled as a technical correction, though that’s not terribly easy to do.  However, implementers are certainly welcome to point to this blog as an authoritative source of guidance on ISO 21090 implementation and make use of any language codes supported in subsequent versions of the IETF Language Tags code system – including RFC 4646 and RFC 5646 as well as any subsequent version there-of.

The RFC 5646 version incorporates all of the languages found in ISO 639-3 and 639-5.  However, be aware that while all languages are covered, there are constraints on the codes that can be used for a given language.  Specifically, if a language is represented in ISO 639-1 (2-character codes), that form must be used.  The 3-character variants found in ISO 639-2 cannot be used.  For example, you must send “en” for English, not “eng”.

Question: But I want to send the 3-character codes.  That’s what my system stores.  Can’t I use ISO 639-2 directly?


No.  In the ISO 21090 specification, the “language” property is defined as a CS.  That means the data type is fixed to a single code system.  The code system used is IETF Language Tags, which is consistent with what the w3c uses for language in all of their specifications and encompasses all languages in all of the ISO 639 specifications plus many others (for example, country-specific dialects as well as additional language tags maintained by IANA.)

Question: Well, ok, but what about in the RIM for LanguageCommunication.code.  Can I send ISO 639-2 codes there?


Yes, though with a caveat.  LanguageCommunication.code is defined as a CD, meaning you can send multiple codes – one primary code and as many translations as you see fit.  You are free to send ISO 639-2 codes (the 3-character ones) or any other codes as a translation.  However, LanguageCommunication.code has a vocabulary assertion of the HumanLanguage concept domain, which is universally bound to a value set defined as “all codes from the ietf3066 code system”.  That means the primary code within the CD must be an IETF code.  So that gives you two options:

  1. Fill the root code with the appropriate IETF code – copying the ISO code most of the time and translating the 3-character code to the correct 2-character code for those 84 language codes found in ISO 639-1; or
  2. Omit the root code property and set the null flavor to “UNC” (unencoded), essentially declaring that you haven’t bothered to try translating the code you captured into the required code sytem.

And before you mention it, yes, the reference to IETF 3066 is a problem.  The actual code system name in the HL7 specification is “Tags for the Identification of Languages”, which is the correct name.  However the short name assigned was “ietf3066” and the description in the OID registry refers explicitly to the 3066 version.  This is an error, as IETF 3066 is a version of the IETF “Tags for the Identification of Language” code system and the OID is for the code system, not a particular version of it.  (There have actually been 4 versions so far – 1766, 3066, 4646 and 5646.)  We’ll try to get the short name and description corrected via the HL7 harmonization process

Question: But I don’t want to translate to 2-character codes and I don’t want to use a null flavor.  Can’t we just relax the universal binding?


We can’t relax the binding because the HumanLanguage concept domain is shared by both the ED.language property in the abstract data types specification (which ISO 21090 is based on) and the LanguageCommunication.code attribute.  The ED.language is a CS and so must remain universally bound.

In theory, we could split into two separate domains – one for data types and one for LanguageCommunication.code.  The second one could have a looser binding.  However, it’s hard to make a case for doing that.  There are several issues:

First, having two different bindings for essentially the same sort of information is just going to cause grief for implementers.  You could be faced with declaring what language the patient reads in one code system, but identifying the language of the documentation the patient’s supposed to read in a second code system.

Second, the IETF code system fully encompasses all languages covered by all the ISO 639-x code systems, plus thousands of others expressible using various sub-tags such as identifying country-specific dialects.  In the unlikely situation that you need a language that can’t be expressed using any of those, there’s even a syntax for sending local codes (and a mechanism for registering supplemental codes with IANA if you want to be more official).  So there should never be a situation where you can’t express your desired language using the IETF Language Tags code system.

Question: I don’t really care that I can express my languages in IETF.  I’ve already pre-adopted using ISO 639-2 and -3 in my v3 implementation and I don’t want to change.  Why are you putting constraints in place that prevent implementers from doing what they want to do?


Well, technically your implementation right now is non-conformant.  And implementers always have the right to be non-conformant.  HL7 doesn’t require anyone to follow any of its specifications.  So long as your communication partners are willing to do what you want to do, anything goes by site-specific agreement.

However, the standards process is about giving up a degree of implementation flexibility in exchange for greater interoperability.  By standardizing on a single set of codes for human language, we’re able to ensure interoperability across all systems.  Natively, those systems may use other code systems, but for communication purposes, they translate to the common code system so everyone can safely exchange information.

If the premise for loosening a standard was “we won’t require any system to translate data from their native syntax”, there’d be no standards at all.  Yes, translation and mapping requires extra effort (though a look-up table for 84 codes with direct 1..1 correspondence is pretty easy compared to a lot of the mapping effort needed in other areas.)  But that’s the price of interoperability.