Category Archives: Data Types

Guest Post: HL7 Language Codes

This guest post is written by Lloyd McKenzie.  I’ve been meaning to get to it since the January WGM, but I’ve been wrapped up in other things (most recently HIMSS).  However, I agree with what Lloyd says.

Question: I have a need to communicate a wide variety of language codes in HL7 v3 instances, but the ISO Data Type 21090 specification declares that ED.language (and thus ST.language) are constrained to IETF 3066.  This is missing many of the languages found in ISO 639-3 – which I need.  Also, IETF 3066 is deprecated.  It’s been replaced twice.  Can I just use ISO 639-3 instead?

Answer:

The language in the 21090 specification was poorly chosen.  It explicitly says “Valid codes are taken from the IETF RFC 3066”.  What it should have said is “Valid codes are taken from IETF language tags, the most recent version of which at the time of this publication is IETF RFC 3066”.  (Actually, by the time ISO 21090 actually got approved, the most recent RFC was 4646, but we’ll ignore that for now.)  This should be handled as a technical correction, though that’s not terribly easy to do.  However, implementers are certainly welcome to point to this blog as an authoritative source of guidance on ISO 21090 implementation and make use of any language codes supported in subsequent versions of the IETF Language Tags code system – including RFC 4646 and RFC 5646 as well as any subsequent version there-of.

The RFC 5646 version incorporates all of the languages found in ISO 639-3 and 639-5.  However, be aware that while all languages are covered, there are constraints on the codes that can be used for a given language.  Specifically, if a language is represented in ISO 639-1 (2-character codes), that form must be used.  The 3-character variants found in ISO 639-2 cannot be used.  For example, you must send “en” for English, not “eng”.

Question: But I want to send the 3-character codes.  That’s what my system stores.  Can’t I use ISO 639-2 directly?

Answer:

No.  In the ISO 21090 specification, the “language” property is defined as a CS.  That means the data type is fixed to a single code system.  The code system used is IETF Language Tags, which is consistent with what the w3c uses for language in all of their specifications and encompasses all languages in all of the ISO 639 specifications plus many others (for example, country-specific dialects as well as additional language tags maintained by IANA.)

Question: Well, ok, but what about in the RIM for LanguageCommunication.code.  Can I send ISO 639-2 codes there?

Answer:

Yes, though with a caveat.  LanguageCommunication.code is defined as a CD, meaning you can send multiple codes – one primary code and as many translations as you see fit.  You are free to send ISO 639-2 codes (the 3-character ones) or any other codes as a translation.  However, LanguageCommunication.code has a vocabulary assertion of the HumanLanguage concept domain, which is universally bound to a value set defined as “all codes from the ietf3066 code system”.  That means the primary code within the CD must be an IETF code.  So that gives you two options:

  1. Fill the root code with the appropriate IETF code – copying the ISO code most of the time and translating the 3-character code to the correct 2-character code for those 84 language codes found in ISO 639-1; or
  2. Omit the root code property and set the null flavor to “UNC” (unencoded), essentially declaring that you haven’t bothered to try translating the code you captured into the required code sytem.

And before you mention it, yes, the reference to IETF 3066 is a problem.  The actual code system name in the HL7 specification is “Tags for the Identification of Languages”, which is the correct name.  However the short name assigned was “ietf3066” and the description in the OID registry refers explicitly to the 3066 version.  This is an error, as IETF 3066 is a version of the IETF “Tags for the Identification of Language” code system and the OID is for the code system, not a particular version of it.  (There have actually been 4 versions so far – 1766, 3066, 4646 and 5646.)  We’ll try to get the short name and description corrected via the HL7 harmonization process

Question: But I don’t want to translate to 2-character codes and I don’t want to use a null flavor.  Can’t we just relax the universal binding?

Answer:

We can’t relax the binding because the HumanLanguage concept domain is shared by both the ED.language property in the abstract data types specification (which ISO 21090 is based on) and the LanguageCommunication.code attribute.  The ED.language is a CS and so must remain universally bound.

In theory, we could split into two separate domains – one for data types and one for LanguageCommunication.code.  The second one could have a looser binding.  However, it’s hard to make a case for doing that.  There are several issues:

First, having two different bindings for essentially the same sort of information is just going to cause grief for implementers.  You could be faced with declaring what language the patient reads in one code system, but identifying the language of the documentation the patient’s supposed to read in a second code system.

Second, the IETF code system fully encompasses all languages covered by all the ISO 639-x code systems, plus thousands of others expressible using various sub-tags such as identifying country-specific dialects.  In the unlikely situation that you need a language that can’t be expressed using any of those, there’s even a syntax for sending local codes (and a mechanism for registering supplemental codes with IANA if you want to be more official).  So there should never be a situation where you can’t express your desired language using the IETF Language Tags code system.

Question: I don’t really care that I can express my languages in IETF.  I’ve already pre-adopted using ISO 639-2 and -3 in my v3 implementation and I don’t want to change.  Why are you putting constraints in place that prevent implementers from doing what they want to do?

Answer:

Well, technically your implementation right now is non-conformant.  And implementers always have the right to be non-conformant.  HL7 doesn’t require anyone to follow any of its specifications.  So long as your communication partners are willing to do what you want to do, anything goes by site-specific agreement.

However, the standards process is about giving up a degree of implementation flexibility in exchange for greater interoperability.  By standardizing on a single set of codes for human language, we’re able to ensure interoperability across all systems.  Natively, those systems may use other code systems, but for communication purposes, they translate to the common code system so everyone can safely exchange information.

If the premise for loosening a standard was “we won’t require any system to translate data from their native syntax”, there’d be no standards at all.  Yes, translation and mapping requires extra effort (though a look-up table for 84 codes with direct 1..1 correspondence is pretty easy compared to a lot of the mapping effort needed in other areas.)  But that’s the price of interoperability.

Question: GTS cardinality in CDA

Question:

I have an RMIM that states “effectiveTime” is a GTS[0..1], that implies a SET<TS>. (CDA R2, SubstanceAdministration.effectiveTime). Furthermore, I have a (Schematron) constraint that effectiveTime is [1..1].

Thee following snippet is ok by schema, but the schematron constraint fails:

 <effectiveTime xsi:type="IVL_TS">
  <low value="20110301"/>
  <high value="20120301"/>
 </effectiveTime>
 <effectiveTime institutionSpecified="true" operator="A" xsi:type="PIVL_TS">
  <period unit="h" value="6"/>
 </effectiveTime>

What does the snippet mean, and is it legal?

Answer:
The snippet identifies the intersection of “every 6 hours” and “1-Mar 2011 to 1-Mar 2012″. It’s meaningful, and it’s valid CDA. Whether it’s valid against the schematron.. that’s a much harder question. As a background, the cardinality stated against an item in the RIM is the UML cardinality. There’s a little bit of ambiguity here, with collections – does a collection<T> with cardinality 1.2 mean one or items in a collection, or 1 or 2 collections of items? UML is unclear, but most interpretations consider it to be the former, as does the HL7 v3 methodology that underlies the RIM.

So, the cardinality in the CDA document is the UML cardinality. When the CDA document is represented in XML, using the XML representations defined in the base v3 methodology, there is one xml element for each data type, and therefore there is a direct correspondance between the UML cardinality and the XML cardinality: UML cardinality 1..2 means 1..2 XML elements… for all cases. Except, that is, for GTS.

GTS is a set of items. A single set – a set with a cardinality of 1 – can be built by specifying an element, and then adding repeated elements, each of which define an operation that is to be performed on the set using the operator attribute. No matter how many XML elements there are, there is only one set. So the UML cardinality is 1, irrespective of the XML element cardinality.

Aside – I need to make several caveats to this statement:

  • All the repeating elements must have an operator. If there’s a repeating element that doesn’t have an operator, then this is an invalid instance (Act.effectiveTime has an upper cardinality of 1 in the RIM itself)
  • Strictly, there are ways to build a bag<T> using BXIT<T> and operators where the UML cardinality doesn’t match the XML element cardinality – but there’s no use for this in the context of CDA (and no rational use for this anywhere)
  • some UML attributes are represented as XML attributes, and cardinality applies differently

So, given this, the question is, what does it mean to say “I have a (Schematron) constraint that effectiveTime is [1..1]” – does it intend to address the UML or the XML cardinality? That’s not clear. Consider this, from the CCDA ballot:

The cardinality indicator (0..1, 1..1, 1..*, etc.) specifies the allowable occurrences within a document instance.

Figure 2: Constraints format – only one allowed

1. SHALL contain exactly one [1..1] participant (CONF:2777).

a. This participant SHALL contain exactly one [1..1] @typeCode=”LOC”
(CodeSystem: 2.16.840.1.113883.5.90 HL7ParticipationType)
(CONF:2230).

The language in this extract is unclear – do they mean, the XML representation of the document instance, or a logical document at the UML level? Not clear. “shall contain one participant”… I’m inclined to think that this is different to “shall contain one participant element”, and that the cardinalities should be understood to line up with the UML cardinalities. Of course, there’s only one case where this matters, which is this one.

So I reckon that fragment at the top should be valid. But there’s a way to work around this – this fragment is equivalent in meaning:

 <effectiveTime xsi:type="SXPR_TS">
  <comp xsi:type="IVL_TS">
   <low value="20110301"/>
   <high value="20120301"/>
  </comp>
  <comp institutionSpecified="true" operator="A" xsi:type="PIVL_TS">
   <period unit="h" value="6"/>
  </comp>
 </effectiveTime>

But it has a cardinality of 1 in the XML. So I recommend to use this form, and not the other. Bingo, problem solved.

Identifiers in CDA Documents- Reporting Tool

This post is prompted by the intersection of two issues:

  • Conclusions from quality checking CDA documents in the Australian National EHR Program
  • A series of questions I took privately about how identifiers work in Consolidated CDA

This post explains how identifiers are supposed to work in a CDA document, and introduces a reporting tool to help implementers assess the quality of identifier usage in a CDA document.

@ID attribute

The first kind of identifier in a CDA document is the “ID” attribute that can appear in the following places:

  • Section
  • ObservationMedia
  • RegionOfInterest

These are added so they can be a target of a linkHtml or renderMultiMedia – i.e. the narrative element <linkHtml ref=”#a1″> points to the section <section ID=”a1″>. Note that the attribute is defined as an xml:id, and values of xml:id must be unique within the XML document that contains them – using this makes it difficult to combine single CDA documents into groups in a single XML content (i.e. atom feed, for instance) – and impossible if they are signed.

The ID attribute also exists on the most of the narrative elements, so that they can be the target of an originalText reference in a CD data type, to indicate that the source of this code is this particular text. This is a very advanced usage. It would also be possible, using this method, to make any narrative element the target of a linkHtml reference, but to my knowledge the CDA specification doesn’t say if this is not legal (I think it’s intended that it’s not).

The ID attribute is also used for references from footnoteRefs to footnotes.

These are the only allowed uses of xml:id attributes in a CDA document. It can’t be used to indicate that this [thing] here is the same as that [thing] over there (i.e. this section and that section share the same author). To do that, you have to use logical object references using the id element

id element

Many elements in the CDA document have a child “id” element that serves to identify the class that contains them. Technically, this is the RIM classes Entity, Role, and Act, which generally are allowed to carry one or more identifiers in the id element

Note that this means that some CDA elements have both a child element “id” and an attribute “ID”. Some tools struggle with this. I would’ve thought that such tools were long fixed – it’s not that uncommon to have duplicate names between attributes and elements, since it’s not wrong, but I’ve found a few dev tools that don’t cope with this in the last 12 months. The only solution is to get back to the maintainer of the tools, and screech loudly at them till they fix their tool.

The id element has two important attributes: root and extension. The root has to be either an OID or a UUID, and an extension – any string not including whitespaces – may be present. The identifier (either the root alone if no extension, or the root+extension) must be globally unique.

I’ve found that this root with optional extension business is a lot harder to grasp than it sounds, partly because OIDs have an internal root/extension structure, and so it’s really unclear whether your leaf concept should be in the root or the extension. Say, for example, you have a medical record number, a six digit number, and you assign an OID for it, 2.16.840.1.113883.19.1. Should you represent your MRNs as

<id root="2.16.840.1.113883.19.1.45235"/>

or as

<id root="2.16.840.1.113883.19.1" extension="45235"/>

Generally, I prefer the second (it allows leading zeros, alpha characters if they become required, and is easier to pull out just the MRN), but both forms are valid, and the decision rests with the person who first registers the OID. And if you look in the OID registry, registered OIDs rarely explain which form is correct.

Another confusing thing is whether an extension is allowed or required if the root is a UUID. It’s allowed – and whether it’s required depends on where the unique part comes from. If I’m going to use a stream of unique numbers to actually make the value unique, and I’m just using the UUID to provide a globally unique space for them, then there’ll be an extension:

<id root=655f67b1-2b11-4038-b82f-f6ab2f566f87" extension="1234"/>

As a rule of thumb, if the UUID is registered in the HL7 OID registry (as 655F67B1-2B11-4038-B82F-F6AB2F566F87 is) then you need an extension for the actual unique part. (Note that the UUID is supposed to be represented in lowercase even though the schema doesn’t say so – and irrespective of what case is registered in the registry).

For any Australian readers, if you aren’t sure about this: consult the new Australian handbook on representing identifiers, see my earlier blog post, or ask me.

Unique Identifiers

The fact that identifiers are required to be unique means two things:

  • The identifier uses a properly allocated OID, or a generated UUID, so that no one else would accidentally use it. This sounds hard, but it’s actually relatively easy; generate a GUID (Ctrl-Alt-G in most IDEs), or just register an OID at the HL7 OID registry, but register it carefully, at a fine enough scope that this what you want to use
  • You have to use in a disciplined fashion, so that you only use it for one thing.

The second part turns out to be harder than it sounds. The problem is that there’s no tool to alert you when you copy paste an identifier from one part of the document to another (or from one part of your code to another). I see too many documents that contain duplicate identifiers – that is, the same identifier is used on different elements that represent different objects.

One of my correspondents asked why we don’t simply make a rule that you can’t have duplicated identifiers in a CDA document, like we have with the ID attribute. This would prevent accidental or lazy use of the same identifier again – but it’s not possible, because there’s valid cases for using the same identifier more than once

Identifiers are not unique in a document

This occurs when the same concept can appear multiple times in the document. For example:

  • When the same template is used multiple times
  • When the same person is both author and legalAuthenticator
  • When the same organisation employs all the personal and scopes the patient for the document

So these are all common cases. Other than the template id, a natural question that arises is about the relationship between two instances of the same object in the same document. Take, for example, this fragment of an author from an Australian CDA example:

<author>
 <assignedAuthor>
  <id root="7FCB0EC4-0CD0-11E0-9DFC-8F50DFD72085" />
  <id root="1.2.36.1.2001.1003.0.8003611234567890" />
  <addr use="WP">
   <streetAddressLine>1 Clinician Street</streetAddressLine>
   <city>Nehtaville</city>
   <state>QLD</state>
   <postalCode>5555</postalCode>
  </addr>
  <telecom use="WP" value="tel:0712341234" />
  <assignedPerson>
   <name>
    <prefix>Dr.</prefix>
    <given>Good</given>
    <family>Doctor</family>
   </name>
  </assignedPerson>
 </assignedAuthor>
</author>

Note to alert Australian readers: yes, I moved HPI-I from it’s normal place, since this is for international readers.

This author has two identifiers, what we might call a technical identifier (the UUID) and the real-world identifier, which is the number by which the author is registered with the national authority. That’s an arbitrary distinction that’s not made in the document itself – the only way to know this is to consult the definitions of the identifiers

For most documents, the author is also the legal authenticator, so we’re going to repeat all the same information there too:

<legalAuthenticator>
 <assignedEntity>
  <id root="7FCB0EC4-0CD0-11E0-9DFC-8F50DFD72085" />
  <id root="1.2.36.1.2001.1003.0.8003611234567890" />
  <addr use="WP">
   <streetAddressLine>1 Clinician Street</streetAddressLine>
   <city>Nehtaville</city>
   <state>QLD</state>
   <postalCode>5555</postalCode>
  </addr>
  <telecom use="WP" value="tel:0712341234" />
  <assignedPerson>
   <name>
    <prefix>Dr.</prefix>
    <given>Good</given>
    <family>Doctor</family>
   </name>
  </assignedPerson>
 </assignedEntity>
</legalAuthenticator>

Note that the element is different, but everything else is the same. However, you could argue that this is redundant – we already provided all the information about the person the first time, and the second time, all we need to do is provide an identifier:

<legalAuthenticator>
 <assignedEntity>
  <id root="7FCB0EC4-0CD0-11E0-9DFC-8F50DFD72085" />
 </assignedEntity>
</legalAuthenticator>

 

On reaching the second case, you go and resolve the first identifier, know that this is referring to the same actual object as the first case, and fill in all the details accordingly. However this is complicated by the fact that in some cases where you can do this, the kind of information you can represent is different in each case (author and custodian, for instance), so you mightn’t be able to provide all the details in the first instance. So what do you do if the second case contains different details from the first? Is that an accident, or the correct way to represent it? Unfortunately, the only way to know is to examine the details on each instance, and reason from the underlying RIM classes – there’s no easy rule of thumb.

One notion that this section suggests is that you can extract these RIM entities, roles and classes out to a persistent data store, and use the identifiers to trace the objects across various documents as you see them. This should be safe, after all, because the identifiers are unique. Only, not so much.

Re-using identifiers between documents

Firstly, there’s no guarantee that a given object will have the same identifier across different CDA documents from the same source. Commonly, CDA documents are generated from some intermediary XML or v2 object that doesn’t have the underlying identifiers in it, even if they exist in the original source. In these cases, the objects may acquire a transient identifier that is used multiple times within each document, but is not maintained across the documents. It’s very difficult to consistently identify an object across documents in this case.

Another problem is that some identifiers actually identify the business process that the object represents, and may end up being attached to multiple different objects that all relate to the same real-world process. Lab Order Ids are a classic case here – they’ll be associated with the object that identifies their acknowledgement response to a request for tests, and to the results that represent the outcomes of the request. Driver’s licenses are another example – they’re used to identify multiple different objects that represent the same person (usually from different institutions).

The upshot of this is that even when done well by the author, you can’t simply rely on the identifiers behaving in any particular way.

Reporting Tool

But very often, identifiers aren’t done well. And there’s no conformance tooling that can automatically figure out whether identifiers are being done properly in a document. So I’ve created a little reporting service that takes a CDA document, scans all the identifiers in it, and produces a report that helps visualise the identifiers, and see whether they are being used properly. We’ll be using it in the Australian national program to help check that a document has good identifiers in it. Feel free to use it in other contexts, and I’d welcome suggestions for how to make it more useful (and crash reports for how to break it).

Follow this link to http://hl7connect.healthintersections.com.au/svc/ids, paste your CDA document into the link, click the button, and then read the report… all the steps up to the last one are real easy. Good luck and happy CDA writing/reading…

Link from a CDA narrative to another CDA document in an XDS repository

So, a use case has come up in the pcEHR for a CDA document that links to another CDA document in an XDS repository (Consolidated View, actually, for those who want to know). So how to do that?

Well, firstly, this use case corresponds relatively directly to the one described here:

For a variety of reasons, it is desirable to refer to the document by its identity, rather than by linking through a URL.

  1. The identity of a document does not change, but the URLs used to access it may vary depending upon location, implementation, or other factors.
  2. Referencing clinical documents by identity does not impose any implementation specific constraints on the mechanism used to resolve these references, allowing the content to be implementation neutral. For example, in the context of an XDS Affinity domain the clinical system used to access documents would be an XDS Registry and one or more XDS Repositories where documents are stored. In other contexts, access might be through a Clincial Data Repository (CDR), or Document Content Management System (DCMS). Each of these may have different mechanisms to resolve a document identifier to the document resource.
  3. The identity of a document is known before the document is published (e.g., in an XDS Repository, Clincial Data Repository, or Document Content Management System), but its URL is often not known. Using the document identity allows references to existing documents to be created before those documents have been published to a URL. This is important to document creators, as it does not impose workflow restrictions on how links are created during the authoring process.

h/t to Keith Boone for that link, btw. So that’s pretty much our use case: to link to a document that’s found in the pcEHR. And you can’t do that by URL, because there’s no URL that directly addresses an XDS getDocument call, and anyway, some background machinery to do with trust is required. So how to do that?

Firstly, in the structured data representation of the CDA, we need to assert that some entry is a reference to another document. For that, we use an external document reference:

This says that this entry is a reference (typeCode=”REFR”) to the external Document. And while we’re at it, we can use the code to indicate what type of document it is, and the templateId to indicate which particular implementation guide it conforms to. And the id of the document – that’s the id by which to request the document from the XDS repository (the pcEHR in this case).

<entry>
 <!-- snip -->
 <reference typeCode='REFR'>
  <externalDocument classCode='DOC' moodCode='EVN'> 
   <templateId extension="1.3" root="1.2.36.1.2001.1001.101.100.1002.120" />  
   <id root="d22de837-9d35-438d-933d-74f08d7657f5"/>
   <code code="60591-5" codeSystem="2.16.840.1.113883.6.1" displayName="Patient Summary" />
  </externalDocument>
 </reference>

So that’s the logical reference to the Shared Health Summary with id “d22de837-9d35-438d-933d-74f08d7657f5″. A clinical system encountering this knows that it needs to retrieve that document from it’s cache or the pcEHR or some other local repository as configured. But what about the narrative?

Linking to an external document from Narrative

The Consolidated view is delivered as a CDA document because.. what other standard way to do is there? Everything else is CDA anyway…. so we might as well use it, which means we need to have a narrative, and it needs to refer to the document. This, it turns out, is not so easy. We need to use <linkHtml> to refer to the other document:

 <text>
  <paragraph>
    <linkHtml href="..."/>Other document</linkHtml>
  </paragraph>
 </text>

The problem is what to put in the href attribute. The obvious thing to put in is an id reference (e..g. #a3) to the external document, which refers to the external document directly:

 <reference typeCode='REFR'>
  <externalDocument ID="a3" classCode='DOC' moodCode='EVN'>

But you can’t have an ID on the externalDocument, and anyway, the rules for an internal reference on linkHtml are:

The target of an internal reference is an identifier of type XML ID, which can exist on other elements in the same or a different narrative block, or XML ID attributes that have been added to the <section>, <ObservationMedia>, or <renderMultiMedia> elements

So that’s a dead end. The IHE page I referred to above which describes the problem nicely suggests a way to link the external reference to the linkHtml:

<text><paragraph><linkHtml ID="a3" href="..."/></paragraph></text>
<!-- snip -->
<entry>
 <!-- snip -->
 <reference typeCode='REFR'>
  <externalDocument classCode='DOC' moodCode='EVN'> 
   <templateId extension="1.3" root="1.2.36.1.2001.1001.101.100.1002.120" />  
   <id root="d22de837-9d35-438d-933d-74f08d7657f5"/>
   <code code="60591-5" codeSystem="2.16.840.1.113883.6.1" displayName="Patient Summary" />
   <text><reference value="#a3"></reference>
  </externalDocument>
 </reference>

Only, I don’t think that’s right. The contents of the external document are not the contents of the linkHtml element (which is what that precisely means), and it’s not even really right to say that the contents of the document are the target of the linkHtml, and anyway, we still haven’t resolved what that actually is – pointing something else to the linkHtml element doesn’t resolve what it points to in the href attribute. So while I think that IHE described the problem very nicely, I don’t think they’ve solved it.

This brings us back to the URL portion of linkHtml. It just happens to defined as an xs:string, with the rule: “It can be used to reference identifiers that are either internal or external to the document”. I guess this means that the right way to fill it for external references is with a logical identifier uri. And 3 come to mind:

<linkHtml href="oid:0.1.2.3.4.5.6.7.8.9..."/>
<linkHtml href="uuid:d22de837-9d35-438d-933d-74f08d7657f5"/>
<linkHtml href="hl7-att:root[:extension]"/>

The first two (and their logical variants urn:uuid:… and urn:oid:… – which of these to use is complicated but largely, in the end, irrelevant, because everyone’s going to have to cut code to support whatever goes in here anyway) are obvious places to look, since they are defined externally, but they both suffer the problem that what if the document identifier includes an extension? (and it’s allowed to). For this reason, the V3 data types R2 defined the protocol hl7-att as:

the form hl7-att:[II.literal], such as hl7-att:2.1.16.3.9.12345.2.39.3:ABC123. The scheme hl7-att is used to make references to HL7 Attachments. HL7 attachments may be located in the instance itself as an attachment on the Message class, or in some wrapping entity such as a MIME package, or stored elsewhere. ..[snip].. Attachments SHALL be globally uniquely identified. Attachment id is mandatory, and an ID SHALL never be re-used. Once assigned, an attachment id SHALL be accosiated with exactly one byte-stream as defined for ED.data.

The language there is that of the v3 data types, but the concepts are applicable in this case, and the ground rules are all applicable. Hence, this is the right way to do a reference from one CDA document to another by a logical identifier. To recap, in the structured data:

<entry>
 <!-- snip -->
 <reference typeCode='REFR'>
  <externalDocument classCode='DOC' moodCode='EVN'> 
   <templateId extension="1.3" root="1.2.36.1.2001.1001.101.100.1002.120" />  
   <id root="d22de837-9d35-438d-933d-74f08d7657f5"/>
   <code code="60591-5" codeSystem="2.16.840.1.113883.6.1" displayName="Patient Summary" />
  </externalDocument>
 </reference>

and in the narrative:

<text><paragraph><linkHtml href="hl7-att:d22de837-9d35-438d-933d-74f08d7657f5"/></paragraph></text>

Thanks to Keith Boone for assistance with this post.

p.s. Is it valid to use a URL scheme defined in data types R2 in CDA R2, which use data types R1?  Yes, it is, because it’s valid to use scheme’s defined anywhere else as well.

Technical Correction in Data Types R2 / ISO 21090

According to the data types, the PostalAddressUse Enumeration is based on the code system identified by the OID 2.16.840.1.113883.5.1012. But if you look up that OID in the OID registry (or the MIF, for insiders), you see that:

Retired as of the Novebmer 2008 Harmonization meeting, and replaced by 2.16.840.1.113883.5.1119 AddressUse.

This appears to be my error – I should’ve used the OID 2.16.840.1.113883.5.1119. We’ll have to see about issuing a technical correction.

Question: should you use Concept Ids or Description Ids in HL7 instances?

Question:

In my brief look at the standards, I can’t seem to find any requirement or convention in SNOMED or HL7 regarding the the use of Description IDs vs Concept IDs in messages or datasets.

Is there any preference or can Description IDs and Concept IDs be used interchangeably?

Answer:

It is correct to use Concept Ids as the code in both HL7 v2 messages and CDA (and other v3 instances). Description Ids should not be used.

Explanation:

This is,unfortunately, not documented anywhere. It is implicit in the language used by the datatypes in both v2 and v3, though this is more obvious in v3,with the explicit focus on concepts. While its true that description ids also uniquely identify concepts, they additionally identify display strings which are duplicated by other fields in the appropriate types.

Everything you didn’t want to know about the GTS data type

Of all the HL7 / ISO 21090 data types, by far the most complex is the General Timing Specification (GTS). (Aside: I usually say that CD is the most complex data type, but it’s not; it’s just that people use CD as hard as they can, where as one glance at GTS convinces almost everybody to keep things as simple as they possibly can.)

GTS in Data types R2 / ISO 21090

It’s easier to consider GTS in R2, and then work backwards to explain what GTS is in data types R1 (CDA). In ISO 21090, a GTS is a QSET(TS) – a set of TS. This is a mathematical set – some type of expression that defines what times are “in the set”, rather than a computational set, which lists the values of the set. In the case of ISO 21090, the expression can be built by combining simple times and intervals of times with grouped and nested operations such as Union, Intersection, etc.

Formally, these are defined as a set of classes like this:

 

Type Basic Summary
QSET Abstract base type. Anywhere QSET(T) is specified, one of the types below must be used
IVL A simple interval from start time to end time (may be open and therefore continuing)
PIVL A simple interval of time that repeats regularly (the interval might be assumed, like as in “3 times a day” or explicit, such as “twice a day for half an hour”)
QSI The intersection of two other sets (a simple case: the intersection of 2 times a day for 10 minutes from 10-Aug 2011 to 10-Sept 2011)
QSU The union of two sets of times (perhaps, 3 times a day on week days, and twice a day on weekends)
QSD The difference between two sets of times
QSP The periodic hull of two sets of times:  )
QSH The convex hull of two sets of times:
QSS A list of times, where the time covered by the imprecision of the time is in the set (i.e. 24-April 2012 means all of that day is in the set). This is simpler to use than IVL where single days are covered
QSC A code that identifies a set of times. The codes are only those defined by HL7 or ISO members, and include common clinical codes, as well as holidays:

  • AM  Every morning at institution specified times.
  • PM  Every afternoon at institution specified times.
  • BID  Two times a day at institution specified time
  • TID  Three times a day at institution specified time
  • QID  Four times a day at institution specified time

(note: Alert HL7 readers might wonder where EIVL is. Since this is my blog, I get to send it off to Antarctica from where I’ll retrieve it when it’s a cold day in hell)

That’s pretty confusing – so let’s clarify slightly by an example:  “every other Tuesday in the season from the (US holidays) Memorial Day to Labor Day in the years 2002 and 2003”. This is built as an expression of the intersection between 3 sets:

  • every other Tuesday;
  • the years 2002 and 2003;
  • the season between Memorial Day and Labor Day.
<example xsi:type="QSI_TS"><!-- intersection, because it is a QSI -->
 <!-- every other Tuesday -->
 <term xsi:type='PIVL_TS' alignment='DW'>
  <phase lowClosed='true' highClosed='false'>
   <low value='20001202'/>
   <high value='20001203'/>
  </phase>
  <period value='2' unit='wk'/>
 </term>

 <!-- 2002 and 2003 -->
 <term xsi:type='IVL_TS' lowClosed='true' highClosed='false'>
  <low value='20020101'/>
  <high value='20040101'/>
 </term>
 <!-- season between Memorial Day and Labor Day -->
 <!-- periodic hull between Memorial day and Labor Day -->
 <term xsi:type='QSP_TS'>
  <low xsi:type="QSI_TS">
  <!-- memorial day: intersection of last week of May and mondays -->
   <term xsi:type='PIVL_TS'>
    <phase highClosed='false'>
     <low value='19870525'/>
     <high value='19870601'/>  
    </phase>
    <period value='1' unit='a'/>
   </term>
   <term xsi:type='PIVL_TS'>
    <phase highClosed='false'>
     <low value='19870105'/>
     <high value='19870106'/>
    </phase>
    <period value='1' unit='wk'/>
   </term>
  </low>
  <high xsi:type="QSI_TS">
   <!-- labor day :  intersection of first week of Sept and mondays -->
   <term xsi:type='PIVL_TS'>
    <phase highClosed='false'>
     <low value='19870901'/>
     <high value='19870908'/>
    </phase>
    <period value='1' unit='a'/>
   </term>
   <term xsi:type='PIVL_TS'>
    <phase highClosed='false'>
     <low value='19870105'/>
     <high value='19870106'/>
    </phase>
    <period value='1' unit='wk'/>
   </term>
  </high>
 </term>
</example>

For me, what this example clarifies is the following:

  • You can build any timing description you like from this.
  • There’s too much power to grapple with the general case here in a computable way
  • This is a bad way to communicate timing specifications between people

That’s a particularly nasty combination for me – from a computable view point, it’s too powerful. From a human view point – which is where we fall back if we can’t get to compute these things – it’s very clunky.

In practice, every use of the GTS data type that I’ve seen, people either use IVL, PIVL, or a bounded PIVL: a PIVL intersected with a IVL where the IVL serves as start and end dates. Anything else gives nearly everybody fits, and I’ve never seen it used (will be happy to hear real experience otherwise in comments).

R1 vs R2

GTS appears rather different in R1 than R2. Firstly, we completely rewrote how we describe GTS, so that it’s comprehensible (I know what GTS means in R1, but only because I wrote data types R2, and then wrote my R1 -> R2 implementation). We did also make some changes in the features of GTS too, by adding

  • QSC – a coded GTS
  • QSS – a list of times

Other than that, there’s no semantic change.

GTS in R1

I’m not even going to explain how GTS is specified in R1, it just hurts my head. The XML is kind of goofy (though it works). But it has one rather sad side effect that not many people realise. Let’s take that same example as above:

<effectiveTime xsi:type=’SXPR_TS’><!– memorial day –>

<comp xsi:type=’SXPR_TS’>

<comp xsi:type=’PIVL_TS’>

<phase>

<low value=’19870525′/>

<high value=’19870601′ inclusive=’false’/>

</phase>

<period value=’1′ unit=’a'/>

</comp>

<comp xsi:type=’PIVL_TS’ operator=’A'>

<phase>

<low value=’19870105′/>

<high value=’19870106′ inclusive=’false’/>

</phase>

<period value=’1′ unit=’wk’/>

</comp>

</comp>

<!– labor day –>

<comp xsi:type=’SXPR_TS’ operator=’P'>

<comp xsi:type=’PIVL_TS’>

<phase>

<low value=’19870901′/>

<high value=’19870908′ inclusive=’false’/>

</phase>

<period value=’1′ unit=’a'/>

</comp>

<comp xsi:type=’PIVL_TS’ operator=’A'>

<phase>

<low value=’19870105′/>

<high value=’19870106′ inclusive=’false’/>

</phase>

<period value=’1′ unit=’wk’/>

</comp>

</comp>

</effectiveTime>

<effectiveTime xsi:type=’PIVL_TS’ alignment=’DW’ operator=’A'>

<!– every other Tuesday –>

<phase>

<low value=’20001202′ inclusive=’true’/>

<high value=’20001203′ inclusive=’false’/>

</phase>

<period value=’2′ unit=’wk’/>

</effectiveTime>

<effectiveTime xsi:type=’IVL_TS’ operator=’A'>

<!– from 2002 and 2003 –>

<low value=’20020101′ inclusive=’true’/>

<high value=’20040101′ inclusive=’false’/>

</effectiveTime>

 

Rather than having QSX, there’s a single type SXPR_TS, and it has an operator on instead. That’s just syntactical sugar – it’s the same structure. The comp just maps to the various named attributes which are operands on the operations. And some of the attributes are renamed. So that’s not so bad.

But the real difference is the way the base operands are constructed – by tacking effectiveTime elements after each other with operators on them. This is legal, even when the cardinality on the effectiveTime attribute is 0..1, because that’s the cardinality of the GTS, not the cardinality of the effectiveTime element that builds the GTS. That’s a subtlety that not many people are aware of. Additionally, the GTS would mean that same thing if there was only one effective time with 3 components like this:

<effectiveTime xsi:type=’SXPR_TS’><comp xsi:type=’SXPR_TS’>

<!– memorial day –>

<comp xsi:type=’SXPR_TS’>

<comp xsi:type=’PIVL_TS’>

<phase>

<low value=’19870525′/>

<high value=’19870601′ inclusive=’false’/>

</phase>

<period value=’1′ unit=’a'/>

</comp>

<comp xsi:type=’PIVL_TS’ operator=’A'>

<phase>

<low value=’19870105′/>

<high value=’19870106′ inclusive=’false’/>

</phase>

<period value=’1′ unit=’wk’/>

</comp>

</comp>

<!– labor day –>

<comp xsi:type=’SXPR_TS’ operator=’P'>

<comp xsi:type=’PIVL_TS’>

<phase>

<low value=’19870901′/>

<high value=’19870908′ inclusive=’false’/>

</phase>

<period value=’1′ unit=’a'/>

</comp>

<comp xsi:type=’PIVL_TS’ operator=’A'>

<phase>

<low value=’19870105′/>

<high value=’19870106′ inclusive=’false’/>

</phase>

<period value=’1′ unit=’wk’/>

</comp>

</comp>

</comp>

<comp xsi:type=’PIVL_TS’ alignment=’DW’ operator=’A'>

<!– every other Tuesday –>

<phase>

<low value=’20001202′ inclusive=’true’/>

<high value=’20001203′ inclusive=’false’/>

</phase>

<period value=’2′ unit=’wk’/>

</comp>

< comp xsi:type=’IVL_TS’ operator=’A'>

<!– from 2002 and 2003 –>

<low value=’20020101′ inclusive=’true’/>

<high value=’20040101′ inclusive=’false’/>

</comp>

</effectiveTime>

Requirements for GTS

This doesn’t really stand as a full explanation of the GTS data type. I can’t be bothered providing one of those. The question I’m interested in is, what are we trying to do here? What are the real world requirements we need here? Here’s my list:

  • We need to support common things for medications – bid, etc, and simple bounded periodic intervals
  • Whatever we do has to be human readable as a fall back for when computing it is impossible
  • We need to be able to make repeating appointments in a form that calendars can process

Anything else?

Data types and irrelevant features

There’s been quite a bit of criticism of the ISO 21090 data types because they include features that aren’t relevant in every case where they are used, and it’s annoying to have to deal with the features when they don’t apply to the use case at hand. See here for an example, or the comments here, and there was more less-informed criticism of this at the CIMI meeting yesterday.

The trouble with this criticism is that it doesn’t make sense in principle.

The data types are nothing more than basic re-usable patterns that occur throughout healthcare. The whole point of defining a re-usable type is that you choose a set of features that commonly re-occur, and have some inherent behavioral complexity. Then you define a model that represents these things and use it everywhere else. The cost of using the more complex re-usable type is saved many times over by the fact that you get to re-use it nearly for free every where.

So it follows that there’s going to be trade-off between the cost of using the type, and the amount of times you can re-use it – as the features on the type grow, it becomes more useful. So the fact that a type carries features that aren’t used in *all* cases is evidence that it’s worth spending more time getting it right.If you only defined types that perfectly fit their re-use context, then you hardly get any reuse at all – and people have said this to me (“Death to datatypes.  Stick to UML primitives” to quote someone from yesterday).

Data types that include features that aren’t applicable everywhere are a good thing to have.

Of course, you can take it to far, and create a monster that no longer is in a sweet spot because there’s just too much in it. So the question isn’t “why do data types have this extra stuff that I don’t need in my context?”, but “How do you judge the sweet spot when trading between useability and re-use”. We actually have some consensus on that across the various frameworks (hl7 v2, 13606, openEHR, v3/CDA, etc), but some people seem to think that this is part of the problem, not the solution, and want to revisit the whole question.

p.s. Tom’s FOPP principle, which I referenced above, is actually trying to answer my right question (what’s the sweet spot?), but I couldn’t find good examples of the less-informed thinking to link to.

Data quality requirements in v3 data types are both necessary and spurious

There’s three design features in the v3 data types that help make v3 very hard to implement. And they’re so low level, they undercut all the attempts at simplification by greenCDA, etc, and none of that stuff makes much difference.

There’s 3 features I have in mind:

  • CD(etc).codeSystem must be an OID or a UUID
  • II.root must be an OID or a UUID
  • PQ.unit must be a UCUM unit

Along with these features, the requirements around the interaction between the codeSystem/root values and the HL7 OID Registry.

They really make it hard to implement v3, particularly if you are in a secondary use situation – you’re getting codes, units, or identifiers from somewhere else, and you don’t really know authoritatively what their scope and/or meaning is, or in the case of units, you can’t change them, and they’re not UCUM units. You can’t register OIDs on someone else – or if you do, the HL7 OID registry is so far out of control that no one will notice or know (on that subject, 200+ OIDs are registered daily, and any curation is on volunteer time, i.e. it doesn’t happen).

I’ve spent an inordinate amount of time this year working on the problems caused by these 3 features – they just consume so much time when generating proper CDA content. And when I look at the CDA documents that I get sent for review, these are beyond the average implementer who knows v2 well.

And often, we just have to fold on the units, because this is not resolvable until the primary sources can adopt UCUM – and they have their own standards that work to prohibit UCUM adoption. For example, the Australian prescribing recommendations – which are followed directly by many people – prohibit using ug for micrograms, since it is easily confused with mg. Instead, mcg is required. That’s a hand writing based recommendation, but the recommendation doesn’t make that differentiation. I think that this is resolvable, but it’s going to take years of work with the various communities before they’ll go to UCUM.

Necessary Requirements

The problem is that the requirements are thoroughly based on requirements that are necessary to establish an interoperable healthcare record. If you don’t consistently identify codes and identifiers, then you can’t collate all the health information into a single big logical repository (no matter how distributed it is architecturally). If you don’t use UCUM, then units are not computable or reliable – and this is important. So these are necessary requirements. Here in Australia, we are using CDA to build the distributed (pc)EHR. That’s been controversial – there’s still people claiming that we should have used v2. Well, if we had used v2, then we’d still have to solve the data quality requirements somehow – in fact, several other posts I’ve made are about that, because fixing the data quality in v2 messages is worthwhile anyway.

So these requirements for base data quality are necessary – but they sure add hugely to the project cost. And the costs aren’t particularly visible. And there’s a huge amount of legacy data out there for which it is difficult to bring the base data up to the required level

Spurious Requirements

The problem is that the requirements are also spurious in a point to point messaging context. In this context, it’s easier to resolve the data quality issues retrospectively, by local agreement, instead of having to sort these things out unambiguously in advance. But v3 imposes these costs anyway, even when the requirements are spurious. I wonder how much this data quality issue – which I haven’t really heard a lot about – contributes to the resistence to migrate to v3 messaging from v2 messaging, since the benefits aren’t there in the short term.

In particular, these data quality requirements are part of ISO 21090, and when that gets used for internal exchange within a very focused community (my Peter and George example), these data quality requirements are just tax.

RFH

In the RFH data types, I’m going to back off the pressure – it will be possible to represent data with less quality than the v3 data types allow (though they will also allow the same high quality data as well.

 

Question: Intervals and Boundary Imprecision

Introduction

This page addresses a long standing issue in the HL7 v3/CDA community about the impact of imprecision on boundaries on the meaning of an interval. Specifically, if an interval is given as from 20100404 to 20100406, is 10:30 am on 6-Apr 2010 in the interval or not?

Some people claim it should be, that 201004061030 is “in” the value of 20100406, and as long as 20100406 is in the boundary, so is 201004061030. Other people claim that no, although the boundary does have imprecision, it has to be ignored when determining what values are in set specified the interval

Executive Summary: The answer is the second – imprecision is not considered on the boundaries of intervals, and 201004061030 is not in the interval from 20100404 to 20100406.

This page explains the reasoning in some detail, and clarifies some apparent ambiguity in the specifications. This discussion applies equally to R1 and R2. In addition, this page documents a discovered issue in the R2 abstract specification which will probably result in a technical correction by HL7.

Background

The type IVL<T> is defined in the V3 Abstract data types as a specialization of QSET<T> where T can be any kind of quantity. The two kinds of quantities normally encountered in the real world a PQ (physical quantity – a floating point value with a coded unit) and TS (Timestamp – an instant in time with specified imprecision)

A QSET<T> is some specification of an ordered set of values that specifies which value are in the set, and which are outside the set. One simple way to specify a QSET<T> is to specify it as a simple interval – all the values between [low] and [high] are included in the set, and values outside that range are not included.

IVL<T> has other properties than low and how. The properties lowClosed and HighClosed specify whether the boundaries themselves are actually included in the set of values that are in the interval. For instance, you can specify that the interval includes all the values from 2 to 5, but not including 5. Of course, if the interval can only contain integers, that’s not tremendously useful – it’s not different from the interval from 2 to 4. But if the type that the interval is describing has a continuous distribution range – floating point numbers and times – then this is useful and important.

The Abstract data types specification also describes a literal form, which is a textual presentation of the interval. Multiple literal forms are defined; in this discussion we only use the simple first form, the interval form using square brackets, e.g., “[3.5; 5.5["; (where the square brackets denote whether the interval is closed or not. Pointing in means closed, pointing out means not closed). i.e. [3.5; 5.5[ means all the numbers from 3.5 to 5.5, not included 5.5 itself. Note that we also use the hull form below (discussed later).

Note: The rest of the details of IVL are not explored further here. The rest of this discussion assumes that the features and usage of IVL are relatively well understood by the reader. See where can I get information about the datatypes?

Discussion

Although IVL<INT> and IVL<REAL> are not often encountered in real world usage, they are the easiest place to start the discussion.

IVL<INT>

The simplest case is an interval of integers. The meaning of [3; 5] is very clear: the numbers 3, 4 and 5 are included in the interval. There is no question of the imprecision of the boundary, since integers are discretely separated from each other.

In the abstract specification, formal invariants are used to establish meaning- they are the master definition of meaning. The meaning of the boundary of an interval is defined this way. For the simple case of integer, we’ll illustrate how this works, since we’ll be relying on these later.

invariant(IVL<T> x; T e) where x.nonNull.and(x.contains(e)) { x.low.lessOrEqual(e); x.low.nullFlavor.implies(NullFlavor.PINF).not; }; 

Note: In this discussion we’ll focus exclusively on the low boundary; the exact argument applies to the high boundary (the invariant chain is simpler for the low boundary).

This invariant says that if the interval is not null, then it contains any value e if and only if low is less than or equal to e. If the interval is null – well, we make no rules. Note that we haven’t said that a non-null interval must have a non-null low property – only that if low is null, we cannot know whether the interval contains any particular value: since x.low.lessOrEqual(e) cannot be true for any value of e, neither can x.contains(e) (though we may be able to establish on other grounds (i.e. high boundary) that the interval does not contain e).

Note: the invariant says that if x contains e, then x.low <= e. it doesn’t say that if x.low <= e, then x contains e – it’s important to keep track of what implies what.

The meaning of lessOrEqual for integer is defined on QTY:

invariant (QTY x, y, z) where x.nonNull.and(y.nonNull).and(z.nonNull) { x.lessOrEqual(x); /* reflexive */ x.equal(y).not.implies(x.lessOrEqual(y) .implies(y.lessOrEqual(x)).not); /* asymmetric */ x.lessOrEqual(y).and(y.lessOrEqual(z)) .implies(x.lessOrEqual(z)); /* transitive */ };

The lessOrEqual operation must be reflexive, assymmetric, and transitive (follow the links from this page on wikipedia for reasoning). This invariant doesn’t define how you determine what <= is (that’s done in text), but it does define how it behaves, and therefore what it means. The most interesting part for the rest of this discussion is the second one: if x != y, then if x < y then y > x. Not that we use implies. If x = y, we say nothing here (that’s said elsewhere). if x != y and not (x < y) then we don’t say whether x > y – why? Because x and y may not be “comparable”. Obviously integers, reals, etc, always are, but you can’t talk about whether 12g is less than 14m or not. However if we can compare them, and x != y and x < y, then it also must be true that y > x.

So, in an interval of [3; 5], 3 is in the interval, because 3 <= 3, but 2 is not in the interval because 3 <= 2 is not true.

Well, wow, you say, that bit about invariants was a waste of time. And for integer, it pretty much was – they’re simple beasts. But don’t skip it – we’ll be coming back to these below, and then they will start to become useful.

IVL<REAL>

An interval of real introduces a two new considerations:

  • unlike integers, which are discrete (you can always tell them apart), real numbers do not behave like this. What’s the next value after 4? This has no answer.
  • In addition, real numbers have a precision, which specifies the number of significant digits to which the actual value is represented. The inherent notion of precision is that the actual value may differ slightly from the represented value beyond the specified precision

Operations and precision

Given that real numbers have precision, what impact does this have on operations? In mathematical operations, the precision of the number is combined. In multiplication/division, the precision of the outcome is generally the lower of the two precisions. For instance, 4.0 * 2.000 is 8.0, not 8.000. With addition, it’s more complicated: 4.0 + 0.200000 is 4.2, not 4.200000 . But what is 4.0 + 0.0000001? Intuitively, it’s 4.0, so that x + y = x… so actually, the precision isn’t part of the answer: 4.0 + 0.000001 is 4.000001 but the precision is still 2. (todo: follow up on this)

What about comparison? is 4.0 = 4.0000? Clearly, as stated, these numbers are different in intent. 4.0 represents an implicit boundary from 3.95 to 4.05, while 4.0000 represents an implicit boundary from 3.99995 to 4.00005. But are they equal? Well, the specification says:

Two nonNull REAL are equal if they have the same value and precision.

This text was added as part of defining equality unambiguously for all data types (wiki page with discussion).

Firstly, a clarification: the correct inference from the rule “Two nonNull REAL are equal if they have the same value and precision” is that

 (4.0).equals(4.000).isNull

That was certainly my intent when I wrote that rule, but it didn’t get stated.

But is this notion that REAL values with different precision are not equal actually right?

Unfortunately, No.

Let’s start with an invariant associated with isComparableTo:

invariant (QTY x, y, z) where x.nonNull.and(y.nonNull) { x.isComparableTo(y).equal(x.lessOrEqual(y).or(y.lessOrEqual(x))); };

So if x and y can be compared, then they must be equal, or one less than the other. Therefore either 4.0 and 4.0000 are comparable and equal, or not comparable. And note that this invariant is equals, not implies, so that it follows that if x < y, then x.isComparableTo(y) is true. So if 4.0 != 4.0000, (3.8 < 4.00).not, since they cannot be compared – but no, 3.8 is definitely less than 4.00. Clearly there’s a tension here, and one of those invariants is wrong, or the rule that REALS must have the same precision to be equal is wrong.

To add to this, when we go back to the invariants for QSET, we have this:

invariant(QSET<T> s) where s.nonNull { forall(QTY x, y) where s.contains(x).and(s.contains(y)) { x.isComparableTo(y); }; };

This is relatively simple, and perfectly reasonable: all members of a nonNull QSET must be comparable. You can’t have a valid QSET that contains 5 m and 4g. It doesn’t make sense, and it’s not on. So, if 4.0 != 4.0000, then an interval [3.5; 5.5] cannot contain the value 4.00. But it obviously does and must. So the inevitable conclusion is that 4.0 = 4.000, and that precision cannot be a factor in testing the equality of REAL values – and therefore the rule is wrong.

Note: we could alternately claim that the correct interpretation of equality for a REAL is to consider precision, and to say that 4.0 implies an implicit interval of 3.95 to 4.05, and that the implicit interval implied by 4.0000 is clearly within that boundary, so clearly 4.0000 is equal to 4.0. The problem is that under this scheme, 4.0 is not equal to 4.000, since 4.0 implies a possible value outside that boundaries of that implied by 4.0000. And Equality must be symmetric (follow the links from this page on wikipedia for reasoning). So this can’t be the answer (though an equivalent of “implies” would be a logical addition to REAL, because (4.0).implies(4.0000) and (4.0000).implies(4.0).not, and this is perfectly sensible).

This will be brought to HL7 as a technical correction to the R2 specification, to wit, that the equality rule should say: “Two nonNull REAL are equal if they have the same value irrespective of precision”. (Some additional example and discussion material should also be added)

Having established that precision cannot count for equality, it’s a straight forward conclusion that it can’t count on the border of an interval either. Given the rule:

invariant(IVL<T> x; T e) where x.nonNull.and(x.contains(e)) { x.low.lessOrEqual(e); x.low.nullFlavor.implies(NullFlavor.PINF).not; };

Value e can only be in interval x if e is lower than it. 2.99995 < 3.0, so it is not in the interval. We can say this with confidence because if the comparison of e and low cannot be null just because they are equal with different precisions, then the comparison cannot be null if their values are close with different precisions. (And even if the comparison was null, all we could say is “we don’t know whether they are in the interval”)

So, the interval [3.5;5.5[ does contain the values 4, 4.0, 4.0000000000000000000000000, 3.5, 3.5000000, 3.500000000000000001, 5.49, and 5.49999999999999999999999999999999999, but not the values 3.49999999999999999999, 5.51 or 5.50000000000000000000000.

IVL<PQ>

The situation is the same for IVL<PQ> - other than the fact that the units must all have the same canonical form in UCUM (to make x.isComparesTo(y) true), the behaviour of IVL<PQ> with regard to boundaries is based on the value of PQ, which is a REAL.

IVL<TS>

TS differs from REAL in that the precision is not equally distributed around the stated value. Instead, it starts at the stated value, and goes to the end of the implied period. To illustrate this, a value of 5.1 implies 5.05 to 5.15, equally distributed around 5. On the other hand, the TS value of 20100404 implies the day 4-Apr 2010, and the implicit time is from 00:00 to 23:99 on that day (actually, [201004040000;20100405000[)

Other than this fact, the situation with regard to TS is the same as that with regard to REAL, and for exactly the same reasons: precision is not counted.

Of course, because of the way that the TS imprecision is distributed, the low boundary is not the interesting case, it's the high boundary; Given an interval of [20100404;20100406], is 10pm on the 6-April in that interval? A careless reading of the interval – from the 4th to the 6th of April would imply that it is. But it isn’t, for the reasons described above. The interval [20100404;20100406] is not (from the 4th to the 6th, but from the start of the 4th to the start of the 6th).

TS must be the same as REAL because precision cannot count towards the comparisons, either the equality, or the lessOrEqual, or the greatorOrEqual. So when the R2 abstract Specification says that for TS:

“Two nonNull TS are only equal if they have the same precision”

invariant(TS x, y) where x.nonNull.and(y.nonNull) { x.equal(y).equal(x.offset.equal(y.offset)).and(x.precision.equal(y.precision)); };

This is the same error as for REAL, and will be part of the technical correction discussed above. The invariant should say:

invariant(TS x, y) where x.nonNull.and(y.nonNull) { x.equal(y).equal(x.offset.equal(y.offset)); };

and therefore 20100404 = 20100404000000.000

TS redefines lessOrEqual in R2. I’m the editor, and I can’t say that there’s any coherence in that redefinition at all. The definition is non-sensical in parts – a copy/paste error, and wrong where it differs from the definition of lessOrEqual on QTY, in that it says:

” The outcome of lessOrEqual between two TS is NULL unless they have the same precision”

This is wrong, for the same reasons as the equality tests on REAL and TS as discussed above.

Even worse is this invariant:

invariant(TS x, y) where x.nonNull.and(y.nonNull) { x.lessOrEqual(y).nonNull.implies(x.offset.equal(y.offset)); };

This is an outright typo. It should say, x.lessOrEqual(y).nonNull.implies(x.precision.equal(y.precision)), but as we have discussed, even that would be wrong. This whole section (QTY.lessOrEqual) should be removed in the technical correction – it doesn’t say anything useful at all, even when corrected.

The Hull Literal Form

Much of the confusion around this area comes the existence of the hull literal form, and some careless language associated with it’s definition. Quoting from the Abstract specification (same in R1 and R2):

Example: May 12, 1987 from 8 to 9:30 PM is “[198705122000;198705122130]“.

NOTE: The precision of a stated interval boundary is irrelevant for the interval. One might wrongly assume that the interval “[19870901;19870930]” stands for the entire September 1987 until end of the day of September 30. However, this is not so!, The proper way to denote an entire calendar cycle (e.g., hour, day, month, year, etc.) in the interval notation with is to use an open high boundary. For example, all of September 1987 is denoted as “[198709;198710[“.

The “hull-form” of the literal is defined as the convex hull (see IVL.hull) of interval-promotions from two time stamps. For example, “19870901..19870930” is a valid literal using the hull form. The value is equivalent to the interval form “[19870901;19871001[“.

Though the note in the quote above agrees with this document in regard to the interpretation of an interval, it’s unclear because the statement is not clear about whether this note concerns the interpretation of the interval, or just that particular literal form. The waters are further muddied by the comment immediately after regarding the definition of the hull form, where the interpretation of the literal form is dependent on the boundary precision.

So, to clarify: the Hull literal form is not a simple interval: it’s the convex hull of two intervals implied by the imprecision of the stated boundaries. The literal hull is not actually an interval. It’s a QSCH<IVL<T>> where QSCH is QSetConvexHull – a type that we missed defining in R2 (and will add in R3) – and which will have a DSET of sets as it’s operands (probably).

Since we are having a technical correction, we will clarify the uncertainty introduced by this definition of the literal hull at the same time, by being more explicit that the note concerns the definition of Interval, not the literal, and making more of the fact that the hull form is a convex hull of intervals, not an interval itself.

Status

This page is awaiting final approval by MnM (HL7 committee).