Monthly Archives: November 2011

Data quality requirements in v3 data types are both necessary and spurious

There’s three design features in the v3 data types that help make v3 very hard to implement. And they’re so low level, they undercut all the attempts at simplification by greenCDA, etc, and none of that stuff makes much difference.

There’s 3 features I have in mind:

  • CD(etc).codeSystem must be an OID or a UUID
  • II.root must be an OID or a UUID
  • PQ.unit must be a UCUM unit

Along with these features, the requirements around the interaction between the codeSystem/root values and the HL7 OID Registry.

They really make it hard to implement v3, particularly if you are in a secondary use situation – you’re getting codes, units, or identifiers from somewhere else, and you don’t really know authoritatively what their scope and/or meaning is, or in the case of units, you can’t change them, and they’re not UCUM units. You can’t register OIDs on someone else – or if you do, the HL7 OID registry is so far out of control that no one will notice or know (on that subject, 200+ OIDs are registered daily, and any curation is on volunteer time, i.e. it doesn’t happen).

I’ve spent an inordinate amount of time this year working on the problems caused by these 3 features – they just consume so much time when generating proper CDA content. And when I look at the CDA documents that I get sent for review, these are beyond the average implementer who knows v2 well.

And often, we just have to fold on the units, because this is not resolvable until the primary sources can adopt UCUM – and they have their own standards that work to prohibit UCUM adoption. For example, the Australian prescribing recommendations – which are followed directly by many people – prohibit using ug for micrograms, since it is easily confused with mg. Instead, mcg is required. That’s a hand writing based recommendation, but the recommendation doesn’t make that differentiation. I think that this is resolvable, but it’s going to take years of work with the various communities before they’ll go to UCUM.

Necessary Requirements

The problem is that the requirements are thoroughly based on requirements that are necessary to establish an interoperable healthcare record. If you don’t consistently identify codes and identifiers, then you can’t collate all the health information into a single big logical repository (no matter how distributed it is architecturally). If you don’t use UCUM, then units are not computable or reliable – and this is important. So these are necessary requirements. Here in Australia, we are using CDA to build the distributed (pc)EHR. That’s been controversial – there’s still people claiming that we should have used v2. Well, if we had used v2, then we’d still have to solve the data quality requirements somehow – in fact, several other posts I’ve made are about that, because fixing the data quality in v2 messages is worthwhile anyway.

So these requirements for base data quality are necessary – but they sure add hugely to the project cost. And the costs aren’t particularly visible. And there’s a huge amount of legacy data out there for which it is difficult to bring the base data up to the required level

Spurious Requirements

The problem is that the requirements are also spurious in a point to point messaging context. In this context, it’s easier to resolve the data quality issues retrospectively, by local agreement, instead of having to sort these things out unambiguously in advance. But v3 imposes these costs anyway, even when the requirements are spurious. I wonder how much this data quality issue – which I haven’t really heard a lot about – contributes to the resistence to migrate to v3 messaging from v2 messaging, since the benefits aren’t there in the short term.

In particular, these data quality requirements are part of ISO 21090, and when that gets used for internal exchange within a very focused community (my Peter and George example), these data quality requirements are just tax.

RFH

In the RFH data types, I’m going to back off the pressure – it will be possible to represent data with less quality than the v3 data types allow (though they will also allow the same high quality data as well.

 

Representation of Common Australian Identifiers in v2 and CDA

This is a wrap up of presentations made last week in the Australian HL7 meeting, along with some additional late breaking information: Medicare OIDs have been finalised.

This post iterates through a list of common Australian Identifiers, and shows how to represent each of them in both v2 messages, and CDA documents. If there’s any interest, I’ll provide a DICOM mapping later (note interest in comments). Some of what I have to say, I’m a little unsure of (v2 issues) and I’ll make corrections out of the comments if necessary. The CDA representation is based on the common NEHTA extension for identifiers defined through all the NEHTA CDA implementation guides and also clarified here (vendor firewall, sorry, I’m trying to get it into public), and the CDA implementation guides provide the context for understanding where the asEntityIdentifier goes. Note that the asEntityIdentifier extension is expected to be part of any future CDA version

Institution Medical Record Number

This is also commonly referred to as “UR” for Unit Record Number. There’s no differentiation between “MR” and “UR” (or “MRN”), though a patient may have multiple such numbers due to organisational restructures, and at least one hospital uses “MR” and “UR” to refer to different numbers. However in an HL7 message or a CDA document, they are all “MR” numbers and are differentiated by the assigning authority/scope information.

This example is for a UR number at St Vincent’s Hospital, Melbourne

v2:

PID-3: |123456^^^SVH^MR|

CDA:

<ext:asEntityIdentifier classCode="IDENT">
  <ext:id assigningAuthorityName="SVH" root="[OID]" extension="123456"/>
  <ext:code code="MR" codeSystem="2.16.840.1.113883.12.203"/>
</ext:asEntityIdentifier>

Notes:

  • In the CDA rendering, the value of [OID] has to be filled in with an OID specific to the particular identifier as used by SVH. Generally someone associated with SVH should define the OID and register it on the HL7 OID registry. The OID allows for unambiguous identifcation of St Vincent’s Hospital Melbourne. Actually, it just identifies the registry record for the identifier, and the usefulness of this partly depends on the quality of the registration record – though at least there’ll be no duplication, unlike with the code “SVH”. As of today (28 Nov 2011), I can find no registered OIDs for any Australian hospitals.
  • In the CDA rendering, we have “Assigning Authority Name” giving the informal code, but systems are not allowed to take meaning from that, instead it should come from the OID.
  • The CDA identifier is given a type using the code for the identifying (IDENT) entity, which is “MR” from the v2 identifier type table. The v2 identifier type table is shockingly messy, but still the best source available
  • In the v2 rendering, all we know is that the identifier “123456” is a medical record number used by the hospital identified by the code “SVH”. It’s assumed that there’ll be local logic to make sense of “SVH” (which St Vincent’s Hospital?). Some times this local logic is dependent on the known source of the message.
  • Technically, component 4 is an identifier, and you could do something like this: |123456^^^SVH&[OID]&ISO^MR| – this would make the v2 identifier as rigorous as the CDA identifier, though there’s generally no point, since it’s easier and more cost-effective to do local logic. (This might not be true in the case of inter-institutional clinical messaging, and I’ll be proposing something there when I write the MSIA clinical messaging profile)
  • You’d use different OIDs/assigning authorities for cases where a patient has multiple local identifers due to organisational restructuring.
  • There’s an additional HL7 identifier type called “LR” (Local Registry). I’m not at all clear on when it would be appropriate to use “LR” instead of “MR”. Perhaps this would be appropriate for an area wide health service identifier that isn’t used locally in the actual institutions – but usually these are in the process of being introduced anyway. The same applies to the HL7 identifier types “PI” and “PE” – these are not well differentiated (comments please…)

National IHI

This is the new nationwide patient identifier being introduced by NEHTA at the moment. They’re starting to be exchanged.

v2:

PID-3: |8003601234512345^^^AUSHIC^NI|

CDA:

<ext:asEntityIdentifier classCode="IDENT">
 <ext:id assigningAuthorityName="IHI"
    root="1.2.36.1.2001.1003.0.8003601234512345"/>
 <ext:assigningGeographicArea classCode="PLC">
  <ext:name>National Identifier</ext:name>
 </ext:assigningGeographicArea>
</ext:asEntityIdentifier>

Notes:

  • The inclusion of the assigning geographic area is for round tripping information with the HI service interface, and based on AS 5017, and has no practical utility that I know of
  • The way the IHI is coded in the CDA document – with the IHI in the OID – frustrates people, but isn’t technically wrong. I don’t know why it was chosen
  • There’s a NEHTA document out that says that this is the correct OID: 1.2.36.1.2001.1003.0.8003.6012.3451.2345. I don’t know what happened here (and no, I’m not going to link to it), but it’s not the right one
  • There’s been some debate about the v2 representation since AUSHIC (Australian HIC) has been renamed several times since this code was defined. But it’s a logical code, not a name, and it doesn’t get updated as the name of the logical entity changes
  • There’s no code with the id to specify the type. This is related to legacy requirements inside the NEHTA process. It wouldn’t be wrong to include <ext:code code=”NPI” codeSystem=”2.16.840.1.113883.12.203″/>, and it might become required at some point in the future.

Medicare Card Number

Australian Medicare card number. This is sometimes used as a patient identifier, and sometimes as an account identifier. This describes it’s use as a patient identifier

v2:

PID-3: |2296818481^^^AUSHIC^MC|

CDA:

<ext:asEntityIdentifier classCode="IDENT">
  <ext:id assigningAuthorityName="Medicare Card Number"
    root="1.2.36.1.5001.1.0.7.1" extension="2296818481"/>
  <ext:code code="MC" codeSystem="2.16.840.1.113883.12.203"/>
</ext:asEntityIdentifier>

Notes:

  • The OID 1.2.36.1.5001.1.0.7.1 is for 10 digit card number. If the actual identifier is 11 digits, because it includes the line number as well, then the OID 1.2.36.1.5001.1.0.7 should be used instead
  • I don’t think the v2 metadata changes whether you use the 10 or 11 digit identifier
  • Fur further information about the medicare card number, see here

When the Medicare number is used to identify a patient account, this is how it’s done:

v2:

PID-3: |2296818481^^^AUSHIC^MC|

CDA:

<ext:coverage2 typeCode="COVBY">
  <ext:entitlement classCode="COV" moodCode="EVN">
    <ext:id assigningAuthorityName="Medicare Card Number"
      root="1.2.36.1.5001.1.0.7.1" extension="2296818481"/>
    <ext:code code="1" displayName="Medicare Benefits"
      codeSystem="1.2.36.1.2001.1001.101.104.16047"/>
  </ext:entitlement>
</ext:coverage2>

Notes:

  • According to AS 4700.1, medicare number goes in PID-3, and not anywhere else

Healthcare Providers – Individual (HPI-I)

New identifier being introduced by NEHTA at the moment.

v2:

PRD-7: |8003610537409456^NPI^AUSHIC|
XCN: |8003610537409456^[surname]^[given]^[etc]^^[title]^^^AUSHIC^^^^NPI|

CDA:

<ext:asEntityIdentifier classCode="IDENT">
  <ext:id assigningAuthorityName="HPI-I"
    root="1.2.36.1.2001.1003.0.8003610537409456"/>
  <ext:assigningGeographicArea classCode="PLC">
    <ext:name>National Identifier</ext:name>
  </ext:assigningGeographicArea>
</ext:asEntityIdentifier>

Notes:

  • There’s two v2 methods, depending on whether it’s PRD-7 or elsewhere.
  • In CDA, HPI-I, HPI-O and IHI are differentiated by the first 5 digits of the identifier itself. They all have the same OID root

Medicare Provider Number

Medicare Provider number – location specific identifier conferring right to bill medicare. Commonly used as a healthcare identifier for providers.

v2:

PRD-7: |049960CT^^AUSHICPR|
XCN: |049960CT^[surname]^[given]^[etc]^^[title]^^^AUSHICPR|

CDA:

<ext:asEntityIdentifier classCode="IDENT">
  <ext:id assigningAuthorityName="Medicare Provider number"
   root="1.2.36.174030967.0.2" extension="049960CT"/>
 <ext:code code="PRN" codeSystem="2.16.840.1.113883.12.203"/>
</ext:asEntityIdentifier>

Notes:

  • For some strange reason, the accepted v2 representation doesn’t include a type

Medicare Prescriber Number

Medicare Prescriber number – location specific identifier conferring right to prescribe using PBS.

v2:

PRD-7: |049960CT^PRES^AUSHIC|
XCN: |049960CT^[surname]^[given]^[etc]^^[title]^^^AUSHIC^PRES|

CDA:

<ext:asEntityIdentifier classCode="IDENT">
  <ext:id assigningAuthorityName="Medicare Prescriber number"
   root="1.2.36.174030967.0.3" extension="049960CT"/>
 <ext:code code="PRES" codeSystem="2.16.840.1.113883.12.203"/>
</ext:asEntityIdentifier>

Notes:

  • PRES seems to be an Australian extension to table 203 defined in AS 4700.3

Healthcare Providers – Organisation (HPI-O)

New identifier being introduced by NEHTA at the moment.

v2:

XON: |[name]^L^8003621771167888^^^AUSHIC^NOI|

CDA:

<ext:asEntityIdentifier classCode="IDENT">
  <ext:id assigningAuthorityName="HPI-O"
    root="1.2.36.1.2001.1003.0.8003621771167888"/>
  <ext:assigningGeographicArea classCode="PLC">
    <ext:name>National Identifier</ext:name>
  </ext:assigningGeographicArea>
</ext:asEntityIdentifier>

Notes:

  • NOI is an extension to table 203 defined for this use.

Laboratory NATA Identifier

Used to identify pathology services in clinical messages. A typical pathology service may have many identified labs – there seems to be some arbitrariness about which to choose.

v2:

HD (i.e. MSH-5): |QML^2184^AUSNATA|

CDA:

<ext:asEntityIdentifier classCode="IDENT">
  <ext:id assigningAuthorityName="NATA"
    root="1.2.36.1.2001.1005.12" extension="2184"/>
 <ext:code code="XX" codeSystem="2.16.840.1.113883.12.203"/>
</ext:asEntityIdentifier>

Notes:

  • XX is “organisation identifier”

Australian registered company

Every Australian company is assigned an ACN with a matching ABN. (some other entities are assigned only an ABN, but this coding is for ACN).

v2:

XON: |[name]^L^087493897^^^ASIX^XX|

CDA:

<ext:asEntityIdentifier classCode="IDENT">
  <ext:id assigningAuthorityName="ACN"
    root="1.2.36.87493897"/>
 <ext:code code="XX" codeSystem="2.16.840.1.113883.12.203"/>
</ext:asEntityIdentifier>

Notes:

  • Part of the root Australian OID (1.2.36) registration is that every ACN defines an OID delegated to the company to manage

Pharmacy Approval Number

Once a pharmacy is registered as a business and has approval from the pharmacy board it may ask Medicare (DHS) for an approval number to become a part of PBS – DHS grant the approval number and it is valid so long as the pharmacy ownership and location remain the same. The number allows the pharmacy to do PBS scripts and claim

v2:

Candidate: ORC-17: |123456789^^AUSPAN|

CDA:

<ext:coverage2 typeCode="COVBY">
 <ext:entitlement classCode="COV" moodCode="EVN">
  <ext:id assigningAuthorityName="Pharmacy Approval Number"
    root="1.2.36.174030967.1.3.2.1" extension="123456789"/>
  <ext:code code="11" displayName="Medicare Pharmacy Approval Number"
    codeSystem="1.2.36.1.2001.1001.101.104.16047"/>
 </ext:entitlement>
</ext:coverage2>

Notes:

  • Candidate v2 mapping is to a CE. v2 is like that – identifiers and codes are somewhat confused. Per ISO 704, codes in a coding system for which there is only one instantiation of code are called “appelations”. The classic case is country codes. These may be either codes or identifiers, and this is what we have here. Note that the v2 mapping here is the only thing I made up for this post (and it’s crap, but what else can you do?)
  • Note that this approval number is not the same as the authority and approval numbers for a specific prescription

Conclusion

That’s the common identifiers I know of. Suggest new ones to provide examples for in the comments below.

Acknowledgements: The information here is taken first from Vince’s Presentation at the HL7 Australia meeting, and then from the NEHTA implementation guides and Australian standards AS 4700.1, 4700.2, 4700.3 and 4700.6

How to identify AMT in CDA documents and HL7 messages

One of the more controversial subjects that came up yesterday at the HL7 Australia meeting was how to represent AMT in CDA documents, and HL7 messages. The fundamental question is whether AMT is identified as the same coding system as SNOMED-CT or not.

For CDA, this means, using the same OID (2.16.840.1.113883.6.96) in the CD.codeSystem attribute. For v2, this means using the same code in component 3 of the CE or CWE data type, which is used in many fields through the v2 message. (in terms of the actual code, some existing implementers are using “Snomed-CT” or “SNOMED-CT” for this, but in v2.6, HL7 settled on “SCT”. Our usual practice in IT-14-x committees would be to pre-adopt the SCT code, but no decision has been made on this one)

There’s several reasons why AMT should be identified as the same code system as Snomed-CT:

  • AMT uses the same logical infrastructure as Snomed-CT
  • AMT has the same root concept, and also the is-a concept is the same
  • AMT is distributed using the Snomed-CT distribution format
  • AMT uses an allocated Snomed-CT extension namespace, and IHTSDO has ruled that codes in the extension namespace of Snomed-CT are identified as part of the Snomed-CT code system. HL7 has agreed with this
  • At some point in the future, AMT will be distributed as part of Snomed-CT AU

For that reason, when AMT codes are exchanged in CDA documents they should have a codeSystem of 2.16.840.1.113883.6.96, and when they are exchanged in v2 messages, they should have a codeSystem of “SCT”. (the same applies to Snomed-CT AU, btw – it’s still Snomed)

That raises the obvious question, “hang on, how do you know that this is an AMT code as opposed to a SNOMED-CT code?” – and you do need to know this, since AMT is distributed separately from Snomed-CT (AU). The correct answer from HL7’s perspective is that this part of the codeSystemVersion attribute on CD, or components 7 & 8 on the CWE data type. Which really implies the same for CE. Hl7 never added codeSystem Version attributes to CE because it was deprecated and split to CWE and CNE. But it would make sense for us to pre-adopt components 7 & 8 on the CE data type).

Actually, that obvious question from the last paragraph isn’t that obvious – why do you need to know what kind of code it is? if you can’t figure out what kind of code it is for yourself, why would you care? It sounds like a bad system design driving interface specifications to me.

The problem with that is that IHTSDO is still to finalise the version string that should be used. The current candidate mapping is:

http://snomed.info/{?m,v}

where:

  • m is a Module Id
  • v is an effectiveTime

The syntax is a URI Template (http://tools.ietf.org/html/draft-gregorio-uritemplate), and it unfolds to either of http://snomed.info/?m=32506021000036107&v=20110531 or  http://snomed.info/?v=20110531&m=32506021000036107

Even if this is finalised anytime soon, it’s hardly an obvious way to do the version references, and it would be nice to get something more directly applicable. (i.e. which version is that referring to?)Arguments people have given not use the same ID:

  • the problems with the version attribute above,

Yes, that’s a problem. It was hoped that IHTSDO would have resolved this by now, but it hasn’t.

  •   AMT uses longer descriptions than Snomed-CT allows.

This doesn’t seem to be true – the length limit for RF1 releases is 255 chars.

  • AMT hasn’t been good at following the guidelines for Snomed-CT extensions so it isn’t an extension
Well, even if that were true, Snomed-CT still thinks of it as an extension. It seems like a bad argument that because AMT hasn’t followed some rule or other of IHTSDO, we’ll break another one. Note that by the terms of the SNOMED CT licence, AMT *is* an “Extension” (this concept is *defined* in the licence).  If anyone wants to argue in the comments that it is not an extension (you know who you are!), then you’ll need to be more specific about why.
  •  It’s not practical to use the codeSystemVersion to pick AMT releases like that – hardly anyone uses a code system version

Well, that’s kind of like saying, we’re doing it wrong, let’s keep doing it wrong.

I’m not really happy with the approach to use the SNOMED-CT id for AMT, mostly because of the mess with the coding system version. But it’s not up to me – IHTSDO has ruled that snomed extensions – codes defined in an IHSTDO issued namespace – are represented as part of Snomed-CT.

Thanks to Michael Lawley for assistance with this post.

p.s. There’s another argument advanced that the different parts of AMT need to be identified. When I understand the rationale for that better, I’ll make another post about that

Coding System Representation in v2 messages

At the HL7 Australia meeting today, it became clear that we need to improve the way that coding systems are represented in HL7 v2 messages.

This problem arises because components 3 and 6 are ST or IS types (in the versions of the standards that are used here in Australia), taken from table 0396. The list of defined code values in table 0396 doesn’t overlap very much with the code systems used here in Australia, and many systems seem to choose the string to represent the code system somewhat at random, and this certainly doesn’t help interoperability.

Here’s my list of coding systems that are used in Australia and will probably need to be represented in v2 messages, along with a candidate string where I know one:

  • Snomed-CT (“SCT” defined in v2.6)
  • AMT (should be “SCT” following the practices defined for CDA)
  • MIMS
  • Docle
  • ICPC2+
  • ICD-10-AM
  • Meteor/AIHW tables
  • ANZSCO occupations
  • PBS Code, PBS Manufacturer Code, MBS Code
  • various codes from AS 4590 and 5017
  • Some private codes from GP system vendors (i.e. GP Best Practice)

These code systems also need defined policies for specifying the version appropriately.

At the meeting we agreed that I would survey HL7 Australia members for their current practices with regard to this, and then I’d collate a candidate list of strings and version policies which would then be posted to the HL7 Australia wiki for refinement.

Please send any contributions for the list to John Carey at jcarey11 at bigpond.net.au. He’ll collate the list for me. He’ll collate the list for me and we’ll post it to the HL7 Australia wiki.

 

Australian HL7 Meeting Tomorrow

Tomorrow is an Australian HL7 meeting. The subject of the meeting is around using codes and identifiers in both HL7 v2 messages and CDA documents. I’ve observed these things being done badly in v2 messages across the country in many contexts, and the step up in rigour that CDA represents is proving challenging for Australian implementers. Hopefully tomorrow will help. I’m making a presentation on coding in CDA documents – the source is here.

In addition, it’s the HL7 Australia AGM tomorrow. See you all there!

Question: Intervals and Boundary Imprecision

Introduction

This page addresses a long standing issue in the HL7 v3/CDA community about the impact of imprecision on boundaries on the meaning of an interval. Specifically, if an interval is given as from 20100404 to 20100406, is 10:30 am on 6-Apr 2010 in the interval or not?

Some people claim it should be, that 201004061030 is “in” the value of 20100406, and as long as 20100406 is in the boundary, so is 201004061030. Other people claim that no, although the boundary does have imprecision, it has to be ignored when determining what values are in set specified the interval

Executive Summary: The answer is the second – imprecision is not considered on the boundaries of intervals, and 201004061030 is not in the interval from 20100404 to 20100406.

This page explains the reasoning in some detail, and clarifies some apparent ambiguity in the specifications. This discussion applies equally to R1 and R2. In addition, this page documents a discovered issue in the R2 abstract specification which will probably result in a technical correction by HL7.

Background

The type IVL<T> is defined in the V3 Abstract data types as a specialization of QSET<T> where T can be any kind of quantity. The two kinds of quantities normally encountered in the real world a PQ (physical quantity – a floating point value with a coded unit) and TS (Timestamp – an instant in time with specified imprecision)

A QSET<T> is some specification of an ordered set of values that specifies which value are in the set, and which are outside the set. One simple way to specify a QSET<T> is to specify it as a simple interval – all the values between [low] and [high] are included in the set, and values outside that range are not included.

IVL<T> has other properties than low and how. The properties lowClosed and HighClosed specify whether the boundaries themselves are actually included in the set of values that are in the interval. For instance, you can specify that the interval includes all the values from 2 to 5, but not including 5. Of course, if the interval can only contain integers, that’s not tremendously useful – it’s not different from the interval from 2 to 4. But if the type that the interval is describing has a continuous distribution range – floating point numbers and times – then this is useful and important.

The Abstract data types specification also describes a literal form, which is a textual presentation of the interval. Multiple literal forms are defined; in this discussion we only use the simple first form, the interval form using square brackets, e.g., “[3.5; 5.5[“; (where the square brackets denote whether the interval is closed or not. Pointing in means closed, pointing out means not closed). i.e. [3.5; 5.5[ means all the numbers from 3.5 to 5.5, not included 5.5 itself. Note that we also use the hull form below (discussed later).

Note: The rest of the details of IVL are not explored further here. The rest of this discussion assumes that the features and usage of IVL are relatively well understood by the reader. See where can I get information about the datatypes?

Discussion

Although IVL<INT> and IVL<REAL> are not often encountered in real world usage, they are the easiest place to start the discussion.

IVL<INT>

The simplest case is an interval of integers. The meaning of [3; 5] is very clear: the numbers 3, 4 and 5 are included in the interval. There is no question of the imprecision of the boundary, since integers are discretely separated from each other.

In the abstract specification, formal invariants are used to establish meaning- they are the master definition of meaning. The meaning of the boundary of an interval is defined this way. For the simple case of integer, we’ll illustrate how this works, since we’ll be relying on these later.

invariant(IVL<T> x; T e) where x.nonNull.and(x.contains(e)) { x.low.lessOrEqual(e); x.low.nullFlavor.implies(NullFlavor.PINF).not; }; 

Note: In this discussion we’ll focus exclusively on the low boundary; the exact argument applies to the high boundary (the invariant chain is simpler for the low boundary).

This invariant says that if the interval is not null, then it contains any value e if and only if low is less than or equal to e. If the interval is null – well, we make no rules. Note that we haven’t said that a non-null interval must have a non-null low property – only that if low is null, we cannot know whether the interval contains any particular value: since x.low.lessOrEqual(e) cannot be true for any value of e, neither can x.contains(e) (though we may be able to establish on other grounds (i.e. high boundary) that the interval does not contain e).

Note: the invariant says that if x contains e, then x.low <= e. it doesn’t say that if x.low <= e, then x contains e – it’s important to keep track of what implies what.

The meaning of lessOrEqual for integer is defined on QTY:

invariant (QTY x, y, z) where x.nonNull.and(y.nonNull).and(z.nonNull) { x.lessOrEqual(x); /* reflexive */ x.equal(y).not.implies(x.lessOrEqual(y) .implies(y.lessOrEqual(x)).not); /* asymmetric */ x.lessOrEqual(y).and(y.lessOrEqual(z)) .implies(x.lessOrEqual(z)); /* transitive */ };

The lessOrEqual operation must be reflexive, assymmetric, and transitive (follow the links from this page on wikipedia for reasoning). This invariant doesn’t define how you determine what <= is (that’s done in text), but it does define how it behaves, and therefore what it means. The most interesting part for the rest of this discussion is the second one: if x != y, then if x < y then y > x. Not that we use implies. If x = y, we say nothing here (that’s said elsewhere). if x != y and not (x < y) then we don’t say whether x > y – why? Because x and y may not be “comparable”. Obviously integers, reals, etc, always are, but you can’t talk about whether 12g is less than 14m or not. However if we can compare them, and x != y and x < y, then it also must be true that y > x.

So, in an interval of [3; 5], 3 is in the interval, because 3 <= 3, but 2 is not in the interval because 3 <= 2 is not true.

Well, wow, you say, that bit about invariants was a waste of time. And for integer, it pretty much was – they’re simple beasts. But don’t skip it – we’ll be coming back to these below, and then they will start to become useful.

IVL<REAL>

An interval of real introduces a two new considerations:

  • unlike integers, which are discrete (you can always tell them apart), real numbers do not behave like this. What’s the next value after 4? This has no answer.
  • In addition, real numbers have a precision, which specifies the number of significant digits to which the actual value is represented. The inherent notion of precision is that the actual value may differ slightly from the represented value beyond the specified precision

Operations and precision

Given that real numbers have precision, what impact does this have on operations? In mathematical operations, the precision of the number is combined. In multiplication/division, the precision of the outcome is generally the lower of the two precisions. For instance, 4.0 * 2.000 is 8.0, not 8.000. With addition, it’s more complicated: 4.0 + 0.200000 is 4.2, not 4.200000 . But what is 4.0 + 0.0000001? Intuitively, it’s 4.0, so that x + y = x… so actually, the precision isn’t part of the answer: 4.0 + 0.000001 is 4.000001 but the precision is still 2. (todo: follow up on this)

What about comparison? is 4.0 = 4.0000? Clearly, as stated, these numbers are different in intent. 4.0 represents an implicit boundary from 3.95 to 4.05, while 4.0000 represents an implicit boundary from 3.99995 to 4.00005. But are they equal? Well, the specification says:

Two nonNull REAL are equal if they have the same value and precision.

This text was added as part of defining equality unambiguously for all data types (wiki page with discussion).

Firstly, a clarification: the correct inference from the rule “Two nonNull REAL are equal if they have the same value and precision” is that

 (4.0).equals(4.000).isNull

That was certainly my intent when I wrote that rule, but it didn’t get stated.

But is this notion that REAL values with different precision are not equal actually right?

Unfortunately, No.

Let’s start with an invariant associated with isComparableTo:

invariant (QTY x, y, z) where x.nonNull.and(y.nonNull) { x.isComparableTo(y).equal(x.lessOrEqual(y).or(y.lessOrEqual(x))); };

So if x and y can be compared, then they must be equal, or one less than the other. Therefore either 4.0 and 4.0000 are comparable and equal, or not comparable. And note that this invariant is equals, not implies, so that it follows that if x < y, then x.isComparableTo(y) is true. So if 4.0 != 4.0000, (3.8 < 4.00).not, since they cannot be compared – but no, 3.8 is definitely less than 4.00. Clearly there’s a tension here, and one of those invariants is wrong, or the rule that REALS must have the same precision to be equal is wrong.

To add to this, when we go back to the invariants for QSET, we have this:

invariant(QSET<T> s) where s.nonNull { forall(QTY x, y) where s.contains(x).and(s.contains(y)) { x.isComparableTo(y); }; };

This is relatively simple, and perfectly reasonable: all members of a nonNull QSET must be comparable. You can’t have a valid QSET that contains 5 m and 4g. It doesn’t make sense, and it’s not on. So, if 4.0 != 4.0000, then an interval [3.5; 5.5] cannot contain the value 4.00. But it obviously does and must. So the inevitable conclusion is that 4.0 = 4.000, and that precision cannot be a factor in testing the equality of REAL values – and therefore the rule is wrong.

Note: we could alternately claim that the correct interpretation of equality for a REAL is to consider precision, and to say that 4.0 implies an implicit interval of 3.95 to 4.05, and that the implicit interval implied by 4.0000 is clearly within that boundary, so clearly 4.0000 is equal to 4.0. The problem is that under this scheme, 4.0 is not equal to 4.000, since 4.0 implies a possible value outside that boundaries of that implied by 4.0000. And Equality must be symmetric (follow the links from this page on wikipedia for reasoning). So this can’t be the answer (though an equivalent of “implies” would be a logical addition to REAL, because (4.0).implies(4.0000) and (4.0000).implies(4.0).not, and this is perfectly sensible).

This will be brought to HL7 as a technical correction to the R2 specification, to wit, that the equality rule should say: “Two nonNull REAL are equal if they have the same value irrespective of precision”. (Some additional example and discussion material should also be added)

Having established that precision cannot count for equality, it’s a straight forward conclusion that it can’t count on the border of an interval either. Given the rule:

invariant(IVL<T> x; T e) where x.nonNull.and(x.contains(e)) { x.low.lessOrEqual(e); x.low.nullFlavor.implies(NullFlavor.PINF).not; };

Value e can only be in interval x if e is lower than it. 2.99995 < 3.0, so it is not in the interval. We can say this with confidence because if the comparison of e and low cannot be null just because they are equal with different precisions, then the comparison cannot be null if their values are close with different precisions. (And even if the comparison was null, all we could say is “we don’t know whether they are in the interval”)

So, the interval [3.5;5.5[ does contain the values 4, 4.0, 4.0000000000000000000000000, 3.5, 3.5000000, 3.500000000000000001, 5.49, and 5.49999999999999999999999999999999999, but not the values 3.49999999999999999999, 5.51 or 5.50000000000000000000000.

IVL<PQ>

The situation is the same for IVL<PQ> – other than the fact that the units must all have the same canonical form in UCUM (to make x.isComparesTo(y) true), the behaviour of IVL<PQ> with regard to boundaries is based on the value of PQ, which is a REAL.

IVL<TS>

TS differs from REAL in that the precision is not equally distributed around the stated value. Instead, it starts at the stated value, and goes to the end of the implied period. To illustrate this, a value of 5.1 implies 5.05 to 5.15, equally distributed around 5. On the other hand, the TS value of 20100404 implies the day 4-Apr 2010, and the implicit time is from 00:00 to 23:99 on that day (actually, [201004040000;20100405000[)

Other than this fact, the situation with regard to TS is the same as that with regard to REAL, and for exactly the same reasons: precision is not counted.

Of course, because of the way that the TS imprecision is distributed, the low boundary is not the interesting case, it’s the high boundary; Given an interval of [20100404;20100406], is 10pm on the 6-April in that interval? A careless reading of the interval – from the 4th to the 6th of April would imply that it is. But it isn’t, for the reasons described above. The interval [20100404;20100406] is not (from the 4th to the 6th, but from the start of the 4th to the start of the 6th).

TS must be the same as REAL because precision cannot count towards the comparisons, either the equality, or the lessOrEqual, or the greatorOrEqual. So when the R2 abstract Specification says that for TS:

“Two nonNull TS are only equal if they have the same precision”

invariant(TS x, y) where x.nonNull.and(y.nonNull) { x.equal(y).equal(x.offset.equal(y.offset)).and(x.precision.equal(y.precision)); };

This is the same error as for REAL, and will be part of the technical correction discussed above. The invariant should say:

invariant(TS x, y) where x.nonNull.and(y.nonNull) { x.equal(y).equal(x.offset.equal(y.offset)); };

and therefore 20100404 = 20100404000000.000

TS redefines lessOrEqual in R2. I’m the editor, and I can’t say that there’s any coherence in that redefinition at all. The definition is non-sensical in parts – a copy/paste error, and wrong where it differs from the definition of lessOrEqual on QTY, in that it says:

” The outcome of lessOrEqual between two TS is NULL unless they have the same precision”

This is wrong, for the same reasons as the equality tests on REAL and TS as discussed above.

Even worse is this invariant:

invariant(TS x, y) where x.nonNull.and(y.nonNull) { x.lessOrEqual(y).nonNull.implies(x.offset.equal(y.offset)); };

This is an outright typo. It should say, x.lessOrEqual(y).nonNull.implies(x.precision.equal(y.precision)), but as we have discussed, even that would be wrong. This whole section (QTY.lessOrEqual) should be removed in the technical correction – it doesn’t say anything useful at all, even when corrected.

The Hull Literal Form

Much of the confusion around this area comes the existence of the hull literal form, and some careless language associated with it’s definition. Quoting from the Abstract specification (same in R1 and R2):

Example: May 12, 1987 from 8 to 9:30 PM is “[198705122000;198705122130]“.

NOTE: The precision of a stated interval boundary is irrelevant for the interval. One might wrongly assume that the interval “[19870901;19870930]” stands for the entire September 1987 until end of the day of September 30. However, this is not so!, The proper way to denote an entire calendar cycle (e.g., hour, day, month, year, etc.) in the interval notation with is to use an open high boundary. For example, all of September 1987 is denoted as “[198709;198710[“.

The “hull-form” of the literal is defined as the convex hull (see IVL.hull) of interval-promotions from two time stamps. For example, “19870901..19870930” is a valid literal using the hull form. The value is equivalent to the interval form “[19870901;19871001[“.

Though the note in the quote above agrees with this document in regard to the interpretation of an interval, it’s unclear because the statement is not clear about whether this note concerns the interpretation of the interval, or just that particular literal form. The waters are further muddied by the comment immediately after regarding the definition of the hull form, where the interpretation of the literal form is dependent on the boundary precision.

So, to clarify: the Hull literal form is not a simple interval: it’s the convex hull of two intervals implied by the imprecision of the stated boundaries. The literal hull is not actually an interval. It’s a QSCH<IVL<T>> where QSCH is QSetConvexHull – a type that we missed defining in R2 (and will add in R3) – and which will have a DSET of sets as it’s operands (probably).

Since we are having a technical correction, we will clarify the uncertainty introduced by this definition of the literal hull at the same time, by being more explicit that the note concerns the definition of Interval, not the literal, and making more of the fact that the hull form is a convex hull of intervals, not an interval itself.

Status

This page is awaiting final approval by MnM (HL7 committee).

Question: What content can ED contain?

One of the most frequently asked questions is what content the V3 data type ED (“Encapsulated Data”) can contain.

There’s a simple answer. ED can contain the following types of data:

  • plain text
  • base64 encoded data
  • XML – a CDA document, a V3 Message, or any other kind of XML
  • CDA structured narrative
  • A reference to a URL from which the data can be obtained

But when we start looking at the details, it’s not quite so simple, which is why it’s such a common question.

ED Abstract Definition

The abstract definition (R2) is as follows:

So an ED has data, which is a list of boolean value (bits). The “abstract” data type definition is an in principle definition of the meaning of the datatypes, without considering any implementation details. And this is the place where it’s abstractness is most evident: binary data is a considered to be a list of bits. I’ve never handled data like that, and I doubt you have either. (ok, if you’re doing huffman… )

Note: the R1 definition differs a little, in that ED specializes BIN, rather than having a data property, but this is syntactical sugar: the meaning is no different

Anyhow, the only thing we need to learn from the abstract spec in this regard is that the data is a series of bytes, and the form in which the data is provided is neither here nor there – we just break it down to a series of bytes. In principle.

The abstract data types also notes that the ED can carry a reference to the data instead of the data itself. In fact, the abstract only introduces the reference property in order to make some rules about the reference – principally that the reference can never be used for any other data. It doesn’t really matter whether the data is provided as binary data directly, or whether it’s provided by a reference to some URL – it’s just a stream of bytes.

An instance can provide the data directly, or it can provide a reference to the data, or it can provide both. If it provides both, they must be the same – so there’s not really a lot of utility in providing both. The normal use for a reference is to provide an image, a thumbnail of the image, and a reference to where the whole (big fat large) image can be retrieved from if the user desires.

XML Representation

When it comes to the XML representation, we can, as the abstract spec describes, provide either the data directly, and/or a reference. If we are going to provide a reference, then the XML looks like this:

<xx mediaType="image/png">
 <reference value="http://temp.myurl.com/images?id=23..."> 
</xx>

<xx> is the name of the element. We don’t know what that is – ED is a type, and the name of the element is assigned in the context in which it is used (this will become relevant later). The ED contains a single element “reference” in the standard v3 XML namespace, which has the reference. Remember that a reference may always be provided, where the data is provided or not.

In the XML representation, there is 3 different ways to represent the data directly in the instance. The first way is as simple plain text:

<xx>This is some plain text</xx>

The plain text is the simple text content of the element itself. This is pretty easy, but only suitable when

  1. the character encoding of the text is the same as the character encoding of the XML data (or you can make it so) (this is usually the case)
  2. You don’t really care about whitespace at the start and the end of the text – or you are sure there isn’t any (not an unusual condition)
  3. the plain text won’t need lot’s of escaping for non-xml savvy characters (i.e. it isn’t binary data like a PDF file)

Note that according to the data types XML specification, if you have thumbnail or reference elements, they come first before the text – though there’s probably no meaningful use of a thumbnail or reference for plain text anyway. Also note that the plain text may contain special characters such as tabs, line feeds etc; but this is usually a bad idea – implementors generally do not handle these characters well or consistently, whether represented directly in the XML. or as character entity references. If you have to exchange these characters, use base64 – this encourages the use of non-XML tools for handling the data that contains them.

If the content doesn’t meet the conditions above – which usually means it’s a pdf, a word document, an image or a video, but anything else is possible and allowed – then the usual way to include the data is as base64 encoded data.

<xx representation="B64" mediaType="image/jpeg">MNYD83jmMdomSJUEdmde9j44zmMir....</xx>

You can always tell when the data is base64 encoded this way, because the representation=”B64″ attribute must be present.

Note: Base64 encoding is not the most dense representation of the data. You don’t have to do it this way. You can embed the XML that contains the ED in a MIME package, add the binary content as a MIME section, and put a reference in the ED instead. Or you can use DIME (shudder!). But whatever you do requires that the recipient expects to receive this; you can’t be sure about that, whereas you can be sure that they can accept base64 encoded data. And base64 encoded data compresses down to about the same size as the same compressed binary.

There’s a third option for representing the data: if it’s XML, HTML, or SGML, and it’s well formed and in the same character encoding as the document, you can stick it straight in as XML. In particular, you could put another CDA document, or a v3 message. Here’s an example:

<xx mediaType="text/html">
 <html>
 <!-- etc --> 
 </html> 
</xx>

The specification says that in this case, the XML fragment must be well formed, and that it must be a in a single element in the ED. So you couldn’t have this, which would be just confusing:

<xx mediaType="text/html">
 <html/> <!-- etc --> 
 <html/> <!-- etc --> 

</xx>

There is, however, a special case, which is CDA structured narrative (also appears in SPL and will be used more widely, I think). Here’s an example:

<text> <content styleCode="Bold">Henry Levin, the 7<sup>th</sup></content> is a 67 year old male referred for further asthma management.  Onset of asthma in his <content revised="delete">twenties</content> <content revised="insert">teens</content>. He was hospitalized twice last year, and already twice this year.  He has not been able to be weaned off steroids for the past several months. </text>

So while in general, an ED carrying XML contains a single well formed element, in the case of CDA, it can carry a mix of text and other elements as described by the CDA structured narrative schema. (and the structured narrative cannot have a reference or a thumbnail).

Irrespective of how the data is provided – as plain text, as base 64 encoded something, as a reference to an attachment or some other source, or as XML, it can be stripped down to a plain old sequence or bytes. In principle. In practice, due to XML handling techniques, character set and character encoding issues, and reference resolution, it’s not always so easy to do this, and it’s not really required very often. For instance, the theoretical definition of equality says that you derive the sequence of bytes for two EDs, and compare these, but it’s extremely rare to compare two ED values, except in the case of plain text data in names.

Media Type

The media type (or “mime type”) of the content must be known and stated in the instance. It has a default value: text/plain. If the media type is something different, you have to say so – even if you’re providing the data as a reference.

The only exception to this rule is in CDA structured narrative, where the media type is fixed and defaulted to “text/x-hl7-text+xml”.

One interesting result of the way the structured narrative is defined is that you can’t use it as is in a general ED; you can only use it directly in a CDA section, or where ever else the applicable specification explicitly allows it’s use. If you want to use the structured narrative in a normal ED (i.e. an Act.text in a v3 message), you have to push it down into a single child element and fill out the mediatype:

<xx mediaType="text/x-hl7-text+xml"> <text> <content styleCode="Bold">Henry Levin, the 7<sup>th</sup></content> is a 67 year old male referred for further asthma management. Onset of asthma in his <content revised="delete">twenties</content> <content revised="insert">teens</content>. He was hospitalized twice last year, and already twice this year. He has not been able to be weaned off steroids for the past several months. </text> </xx> 

Note that the most likely name for xx in this context is actually “text”. The name of the inner element is arbitrary and not specified anywhere, but “text” seems like the most reasonable name to use

R1 ED Schema.

The R1 ED schema content model is:

  <xs:complexType name="ED" mixed="true"> <xs:complexContent> <xs:extension base="BIN"> <xs:sequence> <xs:element name="reference" type="TEL" minOccurs="0" maxOccurs="1"/> <xs:element name="thumbnail" minOccurs="0" maxOccurs="1" type="thumbnail"/> </xs:sequence> <xs:attribute name="mediaType" type="cs" use="optional" default="text/plain"/> <!-- other attributes not of interest here--> </xs:extension> </xs:complexContent> </xs:complexType>

and BIN is:

  <xs:complexType name="BIN" abstract="true" mixed="true"> <xs:complexContent> <xs:extension base="ANY"> <xs:attribute name="representation" use="optional" type="BinaryDataEncoding" default="TXT"/> </xs:extension> </xs:complexContent> </xs:complexType>

The really fun thing about these schema fragments is, where’s the data we’ve been talking about? There’s the “representation” attribute, and a mediaType attribute with the default value, but the data is not actually described…

Well, no, it’s not described. And it’s a great source of confusion for newbie implementors, particularly those not well versed in XML and schema.

In addition to the reference and thumbnail elements, the element for the complexType ED can contain text, which may or may not be base 64 encoded. There’s no way to explicitly describe this text: all we can do is say that the type has mixed content (mixed=”true”). Unfortunately, simply indicating mixed type doesn’t convey what the intent is: that you can have text after the reference and thumbnail elements- the schema is basically useless here.

Note: I think this is a major limitation of schema; there should be a text type, so that we can be specific about the contents of mixed case, instead of simply yielding control like that. However that’s what we have to deal with.

If that problem isn’t bad enough, the schema makes no mention at all of the other things that are allowed in the ED. We can have any additional single element – instead of text – and that element can have any name in other namespace than the v3 namespace, or it can be in the v3 namespace and be a valid v3 instance, with the appropriate name.

Schema can’t describe this content model completely. For some reason lost in the mists of time, HL7 (we, I) don’t describe the content as well as we could, but simply shipped a schema that doesn’t describe the XML feature at all. This causes real problems when it comes to conformance, because the schema is wrong: it wrongly rejects valid instances.

Here’s an improved schema (courtesy of Keith Boone):

  <xs:complexType name="ED" mixed="true"> <xs:complexContent> <xs:extension base="BIN"> <xs:sequence> <xs:element name="reference" type="TEL" minOccurs="0" maxOccurs="1"/> <xs:element name="thumbnail" minOccurs="0" maxOccurs="1" type="thumbnail"/> <xs:any minOccurs="0" namespace="##other" processContents="skip"/> </xs:sequence> <xs:attribute name="mediaType" type="cs" use="optional" default="text/plain"/> <!-- other attributes not of interest here--> </xs:extension> </xs:complexContent> </xs:complexType>

This one allows the additional element, though only in another namespace. Keith defined this schema to allow the incorporation of elements from the v3 namespace:

  <xs:complexType name="ED" mixed="true"> <xs:complexContent> <xs:extension base="BIN"> <xs:sequence> <xs:element name="reference" type="TEL" minOccurs="0" maxOccurs="1"/> <xs:element name="thumbnail" minOccurs="0" maxOccurs="1" type="thumbnail"/> <xs:any minOccurs="0" namespace="##other" processContents="skip"/> <xs:element minOccurs="0" ref="abstractInteraction"/> </xs:sequence> <xs:attribute name="mediaType" type="cs" use="optional" default="text/plain"/> <!-- other attributes not of interest here--> </xs:extension> </xs:complexContent> </xs:complexType>
 <xs:complexType name="abstractInteraction"/> <xs:complexType name='foo'> <xs:complexContent> <xs:extension base="abstractInteraction"> <xs:sequence> <xs:element name="bar"/> </xs:sequence> </xs:extension> </xs:complexContent> </xs:complexType> <xs:element name="abstractInteraction" abstract="true" type="abstractInteraction"/> <xs:element name="foo" substitutionGroup="abstractInteraction"type="foo"/>

This schema needs to be hand coded to allow whatever v3 contents are appropriate.

Changes in R2 (ISO 21090)

This is a considerable pain point for implementers – and by far the most common FAQ for the editors of the specification – so after much debate, we elected to change the XML form in Release 2 (which is ISO 21090). Briefly, the changes are:

  • ED is no longer a mixed content type
  • plain text is moved into an attribute “value” instead of being represented as text in the element
  • base64 encoded content is moved into a “data” element, which is explicitly assigned a type of base64Binary in the schema
  • XML content is moved into an “xml” element. The XML element is contains the same single element that the ED would have contained previously
  • the Structured Narrative type is unchanged
  • You can only have one of (value attribute | data element | xml element)

Though there is no functional change, we have drawn apart the three kinds of data; this allows the content model to be properly described in schema, and for much simpler parsers to be written that do not have to indulge in speculative logic to read all the valid contents of an ED data type (say, when using SAX). In addition, the ED definition makes the 3 forms of data explicitly clear:

 <xsd:complexType name="ED"> <xsd:complexContent> <xsd:extension base="ANY"> <xsd:sequence> <xsd:element name="data" type="xsd:base64Binary" minOccurs="0" maxOccurs="1"/> <xsd:element name="xml" type="xsd:anyType" minOccurs="0" maxOccurs="1"/> <xsd:element name="reference" type="TEL" minOccurs="0" maxOccurs="1"/> <xsd:element name="thumbnail" type="ED" minOccurs="0" maxOccurs="1"/> <!-- other elements not of interest here--> </xsd:sequence> <xsd:attribute name="value" type="xsd:string" use="optional"/> <xsd:attribute name="mediaType" type="xsd:string" default="text/plain" use="optional"/> <!-- other attributes not of interest here--> </xsd:extension> </xsd:complexContent> </xsd:complexType>

Compression and Compressed Data

One last issue to cover about ED. In the ED, you can indicate that the data is compressed using deflate, gzip, zlib, or compress.

If you provide the data in the instance as plain text or XML, then you cannot use the compression attribute – you can’t compress that kind of data. If you provide the data as base64, you can indicate that the data has been compressed using one of these methods.

We need to carefully differentiate between “has been compressed” and “is compressed”. The problem is that some data is inherently compressed using one of these methods in it’s “native form”. In that case, should you mark it as compressed? what if you compressed it again – what does that mean now? The problem is worsened by the fact that some transfer protocols – such as http – provide their own internal compression. What should you do if the data is provided as an http: reference, and the web server that serves the request will automatically use gzip compression on the answer?

After debate, the answer the HL7 committee agreed to is this:

  • Any compression that is defined in a reference protocol – such as http: – is not described by the ED compression attribute
  • Whether the data is provided in-line or as a reference, compression is only specified if (and must be specified if) decompression using the specified algorithm is required to obtain the specified mediatype.

So:

  • if the mediatype is application/gzip, and the content is gzipped in the appropriate form, then you don’t say that it’s compressed.
  • If the mediatype is text/html, and the content provided as a reference, and is still gzipped after resolving any compression specified by the web server in it’s response, then you do say it’s compressed with gzip. Note that a smart web server might see or know that the content of a gzipped file is html, and send the content as a mediaType text/html and gzipped by protocol – so just because the web server content is gzipped doesn’t mean you should assume that the ED reference should be marked as gzip.

This extra complexity is the price of redundancy in protocols.

Clinical Informatics Standards

While I was in Singapore, there was a panel discussion of the degree to which clinicians need to be involved in the formation of healthcare IT standards. I was somewhat surprised to hear that the outcome of the discussion was that there is no need for clinicians to be involved in them at all.

Now while there were particularly local factors involved in the context of the discussion, and it’s resolution, I’ve been thinking about that a lot since. If, by Healthcare IT standards, you mean exchange and persistence infrastructure and base level logical models, then there is no particular reason for clinical users to be involved in the standards development process. Obviously, you need to properly gather requirements from clinically knowledgeable users – and that includes, but is not confined to, clinical users. But these standards are primarily engineering constructs, and clinical users bring no value, or negative value, to this process because they do not understand the nature of the thinking required at this level. (On the other hand, clinical users who have also learnt to think this way are more useful – it’s not the clinical knowledge that is negative, but the lack of knowledge of how to build systems).

I’m watching the price of giving clinical users too much influence over the exchange and system standaards in a couple of contexts right now, and it’s not pretty – they are standing in the way of their own goals.

But there is a real place for real clinical users in healthcare IT standards, and that’s in Clinical Informatics Standards. In this context, clinical informatics standards means things such as which coding systems are used, how clinical concepts such as blood pressure are used, how clinical obligations fit into the workflow. As long as clinical users don’t agree about these things, then the lower level implementation standards will have to cater for the higher level clinical disagreement, and they’ll be looser, more open to interpretation, and harder to implement. Which will reduce their clinical utility.

So the message for clinicians regarding involvement in standards is relatively straight forward: the more you all agree on clinical informatics standards, the more bang for your buck you’ll get from the supporting exchange and system standards.

HL7 saw this a long time ago, and has been reaching out to clinician user groups (colleges, professional associations etc) for several years, but this is a relatively new and slow process. And openEHR has been doing this for along time.  The new CIC process from HL7/IHTSDO/etc is trying to address this problem as well. But none of these things can really achieve their goals until general clinical users are prepared to buy into the value proposition of standards: if I give up something over here, then I’ll gain something over there. Clinical Informatics standards aren’t meaningful unless they constrain how a clinical user operates.

 

Semantic Interoperability #2

I’ve been chatting to Stephen Lynch from CSC (here in Australia) about the holy grail of healthcare interoperability, “Semantic Interoperability”, which is a follow up to this post on my blog. Stephen sent me this link:

http://plato.stanford.edu/entries/information-semantic/

Enjoy reading that link – it’s just the right thing to read if you can’t sleep some night ;-). Stephen says:

Interoperability isn’t about “semantic interoperability”, it’s about shipping around “the facts” for the end-user to make their own interpretation and value judgement on of what’s presented in front of them, in as efficiently accessible and user friendly manner as possible to facilitate and enable “decisions”…While so much energy is bogged down and caught up in this “semantic interoperability” futile quest, the longer e-health will be caught in its present groundhog day mode and glacial progress

Well, I think that all we want to do is ship around the facts, but we don’t really even know what they are. My response was:

I kind of feel as though I’m watching one of those human powered flight competitions, and everyone knows it’s a hilarious joke except the dreamers making the flying machines

Stephen responds:

And I think the meaningful insight here around “semantic interoperability” is that the current advocates do not appreciate and understand the equivalent “principles of flight” (weight, lift, thrust, drag) in their flying competitions and disastrous/humorous flight attempts around “semantics”, and are therefore destined to continue to fall off the end of the peer, while the pragmatic and insightful Wright Brothers realise the dream with their very pragmatic, incremental and truly based on scientific approach to the mastery of the principles of flight.

And gives another reference:

http://www.amazon.com/Stuff-Thought-Language-Window-Nature/dp/0143114247/ref=sr_1_1?ie=UTF8&qid=1318401961&sr=8-1

This is a subject I’m going to pick away at slowly – like I said in my last post, I think we’re actually trying to get to un-semantic interoperability, and we’re deeply confused about our goals. Partly that’s because of the lack of clinical standards, and I’m going to take that up in my next post.

Question: Titles in Names in ISO 21090

Question:

I just wanted to clarify with you if ENXP data type handles salutation (e.g. Mr,Mrs,Dr,Ms,etc) through setting ENXP.type attribute to “TITLE”.

Answer

Yes, you set the type to title, and put the salutation in.

<name>
  <part type="TITLE" value="Mr">
  <part type="FAM" value="Grahame">
  <part type="GIV" value="Grieve">
</name>