Monthly Archives: July 2011

Straw Poll: Choice of a CDA packaging specification

One of the more difficult issues facing the NEHTA CTI team is what to choose for a technical CDA packaging strategy. We need a coherent strategy because CDA documents have assorted attachments. These attachments can include things like:

  • Images
  • Alternative document format representations (such as pdf / rtf)
  • Digital Signatures

The CDA specification itself says (section 3) that when a CDA document is transferred from one place to another, all components of a CDA document must be able to be included in a single exchange package, including if the transfer is across a firewall, and that there is no need to change any of the references in the CDA document. In practice, in NEHTA specifications, this leads to the following rules:

  1. All ED.references to simple images or data must be relative URLs
  2. The reference name must be a GUID followed by a standard file extension where one exists (i.e. jpg, gif, png, etc)
  3. The location of the document must be the same location as the attachments
  4. When a document is moved from one location to another, the attachments must be moved with it
  5. If an image is too big to be moved around with the document it cannot be part of the attested content and other referencing methods will to be used instead)
  6. The digital signature(s) is part of the document package

The second rule – GUIDs – is for a different reason, and is discussed below. Example for #2:

      <value xsi:type=”ED” mediaType=”image/jpeg”>
        <reference value=” 0E7AD252-9C55-4499-8F11-75B2F4F4E584.jpg”/>

These rules are fine – and relatively uncontroversial once the full breadth of the NEHTA specifications is understood. (for techie readers, what is above is the abstract CDA package syntax). But what is controversial is the choice of technology to use for the package form (for techies, the concrete syntax).

Broadly speaking, we have the following choices:


  1. Mime packages
  2. Zip archives


  1. Implicit in part names (very lightweight)
  2. Custom light metadata
  3. The full IHE XDS metadata

1-A is what the v2 standard describes. 2-C is IHE XDM. 1+2-B/1-+2C is what the DIRECT project is using (along with S/MIME)

We’ve been churning internally in NEHTA on what to choose – it’s not an obvious choice. And requirements don’t really seem to help. This is my list of requirements:

  • Easy / ubiquitous implementations
  • Acceptable to  implementers & standards committees
  • Can determine from outside the package
  • Can be represented as base64 in an xml document for SMD
  • Can easily be inserted into an HL7 v2 message
  • Can contain multiple xml documents without having to remediate unique ids or escape xml tags
  • Doesn’t load up the structure with needless features that cost to implement and aren’t relevant to implementers
  • Not overly inefficient (speed over storage requirements)

For me, the single most important requirement of these is that whatever we choose has to be palatable as a solution when it comes forwards to the IT-14 standards committees for approval. But these committees are committees and we get multiple different opinions from influential members, as we do from implementers

Hence this straw poll. Below, after the analysis, in the comments, please vote your preference. Along with your vote, please include your role. Are you a programmer? An analyst? A standards person? Something else?


Mime vs Zip

Mime is the classic packaging format used most of all in email (for attachments). It’s also used in some other internet protocols, including html packages by IE, and by the SOAP protocol for attachments (sometimes). It’s also used in the DIRECT project which is widely regarded as the bees knees at the moment.

Mime has problems though:

  • It’s common to encounter broken mime packages (any regular email user will have seen them)
  • Mime is very flexible – too flexible for this task, which only uses a subset of it’s functionality
  • The important development platforms (DotNet, Java, Delphi, 4G, foxpro – list based on my knowledge of what is being used in Australia for clinical systems) don’t have libraries that support mime.

Zip has a different set of pros and con’s. It’s almost ubiquitous for transfer of groups of files between people (though we’re all getting good at renaming file extensions). There are standard zip libraries for zip on many development platforms, including the common languages listed above.

IHE XDM uses .zip format, as does the docx file format. IHE chose it because of problems with consistent support for mime packages, though I’ve also had problems with zip file consistency across the libraries as well.

Zip has problems:

  • Zip is very flexible – too flexible for this task, which only uses a subset of its functionality
  • Zip doesn’t have any real name-value headers for simple metadata (if that matters, see below)
  • The only relevant standard using Zip is XDM – see metadata discussion below
  • Zip imposes a fixed compression/decompression cost on the software, even when compression isn’t justified (i.e. it’s not delegated to the hardware) (and I bet some readers thought I was going to call zip compression an advantage ;-))

So you can see the question with zip is about metadata – but actually, the problem with Mime is kind of the same. And the choice of mime vs zip is sort of dependent on the metadata choice (even though it isn’t strictly technically linked).


There are two kinds of metadata that might be in the package. One set of metadata which is pretty much needed is which of the parts is the CDA Document, and which is the digital signature. Those two parts independently point to all the other parts and give them meaning, and so they need to be clearly identified somehow. One way to do this is to just fix the part names (ie. Cda.xml and digsig.xml or something like that). Alternatively you can just add a light XML manifest file that represents these things more explicitly (and it turns out that there is a custom built zip profile with digsig and content support called OPC, used in office documents, which is an ISO Spec, but wasn’t a popular choice).

The other kind of metadata is to reproduce the stuff from inside the CDA document and put it in the metadata – patient, author, document type etc. This is what IHE does in the XDS metadata. The problem with this is the metadata can get quite extensive, and quite onerous. Some of the obligatory metadata may not even be contained in the document (such as practiceTypeCode).

This creates problems with synchronisation – is it wrong for the metadata to disagree with the document? That might be legitimate because the patient’s details have changed between writing the document (and freezing it with a digital signature) and actually submitting the document. But it might also be a mistake – the fields differ in error. What do you do about that?

IHE adds a further problem to this because the metadata is controlled by an affinity domain –  which may differ for each XDS repository. Someone has to define the affinity domain. If we aren’t careful, we might end up with multiple different affinity domains specifying different metadata in Australia – which would mean different metadata packages for different repositories – definitely a situation we want to avoid.

Finally, the XDS metadata format is frankly horrifying. CDA is a bit yuck, but that’s nothing compared to the metadata – it’s nearly the worst xml I’ve seen (nothing will catch up with XMI though in the worst XML stakes!).

XDM – Zip + XSD metadata – assumes an affinity domain (the IHE spec is confusing – claims it doesn’t need one, but refers to XDS metadata that does need one). The DIRECT project defines an affinity domain based on the HITSP code sets fixed in C64 (which is US specific). XDM also has the interesting property of allowing multiple different documents in the package and/or the repository (obviously) – as long as there’s no name clashes, which is why one of the rules above is that all names must be GUIDs, so that packages may include multiple documents and their attachments if required.

Making the base package use XDM is great for an XDS-based central repository provider and gateways to that – the metadata they want is already populated (especially if the metadata includes things not in the document). It’s quite onerous for all the perimeter end user systems, who populate it whether the document is going to a repository or not.

So, we can:

  • Just use parts with fixed names. (doesn’t scale into XDS – though we’re not sure whether this is an issue yet)
  • Define a simple XML form for the a set of metadata that may or not be the XDS metadata
  • Just simply bite the bullet and use the XDS metadata, and agree on an affinity domain for all Australia (at least the parts that affect the metadata)

Confused? That’s why we’ve been churning.

We need to pick an approach very soon, and whatever we pick has to be acceptable to the relevant standards committees when they come forward. ETP used OPC-Zip, but this was badly received, and probably not the best described above.

What is the best choice? Vote in the comments. I will approve all non-abusive comments, anonymous or otherwise, but I’d prefer non-anonymous comments. Please indicate in the comments what type of interest you have (implementer, analyst, standards, etc). The results of this straw poll won’t be binding, but will genuinely make a difference. p.s. I understand that many of the interested parties here aren’t allowed to comment on blogs, so I’ll also accept contributions by email at



Question: Why do I need an HL7 license? How do I get one?

The comments in the thread on a prior post lead to someone submitting this question:

I have been developing HL7 based applicaitons without being a member of an HL7 organization. This week I have just realised that I may be in breach of some sort of license! Oops!
My questions are: Where does one find out that this is a requirement? Why is it a requirement? I thought the HL7 standard was open. There are enough books and other resources one can obtain to be able to build solutions. Compliance checking through AHML is free (if one chooses to do so). Opensource solutions such as HAPI and NHAPI can be obtained free (and I have used these as well as rolling my own).
I can understand the need to pay for the official HL7 standards and the AS documentation but not why it is necessary to pay to be able to develop an application (even though the cost is relatively low).

HL7 is not an open standard. (note that many HL7 members – including me – think this is appalling, and HL7 is once again reconsidering this position). So, technically, you need to pay for the official standards. The books and other resources are kind of in a grey zone.

I don’t know what the status of HAPI is. I expect that all the developers are members, as are the developers of any software that uses HAPI.

In general, HL7 figures that you can’t develop conformant standards without needing to actually buy the standards. So it’s the IP that’s licensed, not the pages of the standards.

A few relevant links:

Note that HL7 used to give older versions away, but no longer does so

Actually, after reading that, I have a better idea. Health Intersections Pty/Ltd has an Organizational Membership of HL7 International. I’m prepared to employ anyone in the world who is writing open source HL7 solutions for A$1/year. That will fully license their open source development. Anyone who’s interested in that, please contact me privately at


v2.7 Analysis from Ringholm

Recommended reading for V2 people:

New features of HL7 version 2.7

It’s weird that in Australia we’re proudly advancing to HL7 v2.4…

Actually, while I’ve always felt that the changes from v2 version to v2 version were pretty minimal, there’s two real big changes that are thoroughly worth while in v2.7: the changes to length (which finally make sense now) and the introduction of the PRT segment, which solve a series of practical problems related to the limitations of XCN. It’d be nice to think that we could advance to v2.7 in the v2 space, but I suspect not.


Response to comments on NEHTA ETP specification by Medical Objects

Yesterday at the HL7 Australia meeting, Jarrod Davison from Medical Objects made a presentation about the NEHTA ETP (Electronic Transfer of Prescriptions) that was highly critical of the specification.

Unfortunately it was a little misinformed, misleading, and in some respects, simply wrong.

Due to timing constraints, there was no opportunity to make comments at the appropriate time at the meeting, so none of what Jarrod said was challenged. Given that the event was sponsored by NEHTA, and that NEHTA asked many of its partners to attend the meeting, I have been asked to respond to the most pressing issues here, lest people think that NEHTA is not in a position to make any response.

Before I go on, I want to make it clear that I contract part time to NEHTA, and that part of ETP is my work, though I’m only a small part of a large team (and, btw, this post is my work, not an official response from NEHTA). Now I’m not against criticising NEHTA – even publically; if NEHTA gets it wrong, there’s no reason not to say so. But if you’re going to criticise NEHTA at an industry event, you’d better get it right.

Jarrod’s criticisms of ETP fell into several categories. I’ll deal with the most serious issue first.

Where’s the atomic data?

Jarrod showed this slide:

In this discussion, Jarrod talked about having to try and extract the atomic data out of this tabular form, and criticised ETP for not containing atomic data.

“Where is the atomic data in the example?” The answer is “60 lines further down in the example”:

(btw, the source for this example is inside

If Jarrod had only looked… in fact, anyone who understood CDA would have known where to look.

One CDA document per prescription item

Jarrod showed this slide:

This is from the data hierarchy summary of the document: one prescription item per ETP prescription document. When Jarrod put this slide up, I was pretty taken aback. It doesn’t make sense – a prescription is a document that may contain multiple line items (this is standard clinical practice here in Australia), and that’s exactly what Jarrod had to say.  I thought that there must have been a typo somewhere or something – it was certainly multiple prescription items when I worked on the CDA mappings. But no, it’s not a typo, this is deliberate, I found out, after digging around.

Having one item per prescription document simplifies both system design and pharmacy workflow in the case of NEHTA’s ETP solution.  The pharmacy workflow is the primary issue.  Where a prescription contains multiple items, the pharmacist can choose to dispense one, some, or all of them.  In the ETP solution, the pharmacy dispensing processes is driven by the pharmacist scanning the DAK off the paper form.  The R1 design was heavily criticised by pharmacy stakeholders for the fact that after scanning the DAK, they still had to choose which item was to be dispensed.  They demanded one DAK per item.  This could have been achieved by extending the DAK with an item number but we would have sacrificed precious bits of encryption (given hard limits on barcode length).  Multiple items also complicates repeat management, as there is a single item per dispense record and a need to manage separate repeat counts per item.

All of the above challenges are manageable, and ETP have defined a more complicated solution that allowed multiple items per prescription.  However, through every discussion with every stakeholder, reason was found to have more than one item per prescription.  There were of course perceptions that multiple items were a good thing, but the prescribing system can still prescribe multiple items on a single screen and in a single medical record entry, but then generate individual prescription documents for each one behind the scenes.

So there is  a simple equation of some additional repeat management complexity and the sacrifice of precious encryption bits for zero benefit.  But in the underlying clinical model the prescription item is repeating, and in general models where there does need to be a single document with multiple prescriptions, there can be (and would be).

So there’s a good reason why ETP has one prescription item per document. People might disagree (for instance, as Jarrod said, it means that you’ll have to move around a set of documents), but I think it’s important to understand the reasoning before criticising the specification (especially in public) – you just have to ask (better if the reasoning was explicit in the standard – I haven’t got round to checking whether it is or not).

Use of OPC Zip

CDA documents are actually bundles or packages. This post is too long to take that up, and it’s a subject in its own right, so I’m going to do that in a later post. The ETP specification as presented to Standards Australia says that the CDA package is contained in an OPC Zip package. The OPC zip package specification is the same storage format used in a .docx file, and is an ISO standard (ISO/IEC 29500-2).

Jarrod was critical of this – why not choose a domain relevant standard? (I.e. something that has actually been used in healthcare before). Why not use MIME, like as in the MTHML standard, or as recommended in the CDA specification itself in section 3? (Again, instead of asking us why, Jarrod speculated about us choosing OPC/zip because of concerns about data size).

Well, our natural first instinct was to use a MIME package – it’s an obvious thing to do. But we didn’t stop there. IHE didn’t use MIME – they used .zip. So we asked them why not, and the relevant experts said that they had tried MIME, but had too many tooling consistency issues when they tested implementations (private communication, I can’t find any reference to this in public).

So for XDM, IHE used zip not mime. So we considered using XDM. There’s a couple of reasons not use XDM: it’s really a solution for packaging multiple CDA documents together – and that carries complexity we didn’t need. Also, it includes the XDS metadata in the package, which is a great deal of complexity that wasn’t relevant, and we didn’t want to impose that on the implementers.

So we wanted a .zip file, consistent with IHE, and just a real light manifest that describes which parts are document, digital signature and attachments. Well, that’s what OPC is. And it’s an ISO standard, widely adopted, not too hard to implement, and comes with pre-built open source implementations in java and dotnet.  So it seems like an obvious choice.

But it’s not one that’s been used in healthcare before. The great thing about standards is that there’s so many to choose from. And the thing about being NEHTA is that if you choose a healthcare specific standard, people will criticise you for choosing a weird healthcare specific one, and if you choose one that isn’t, people will criticise you for that too.

Though in this case, I don’t think this OPC Zip thing is the final word. The standards community is still pushing for XDM, in spite of the fact that this is much harder for implementers (well, parts of the community are, and I suspect they’ve got the numbers to win the day on the IT-14 committees). And it’s probably going to be mime packages in v2 in spite of the spectre of tooling difficulties.

Using CDA at all

I’m going to stop there, though I could go on. I think I can reasonably distil the rest of Jarrod’s comments down to a simple question:

“We have v2 messaging working now using an Australian standard, and it works sort of well – why not just use that for ETP”

That’s actually a reasonable question when phrased like that, and one on which there has been much debate.

All I’m going to say here is that in my experience of v2, achieving meaningful conformance is hard, and the way v2 messages confuse content and transport makes doing meaningful digital signatures real hard (a quick scan of the forthcoming ATS for signing HL7 v2 content makes that quite obvious) . Using CDA documents brings many advantages to the picture. Only time will tell whether the ETP package has made the correct choice.


Free V2 to CDA Tool Release


Today, HL7 Australia held a one day meeting here in Melbourne, entitled “Implementing CDA in a v2 world“. The meeting was keynoted by Ken Rubin, who gave a very interesting and insightful discussion of the relationship between v2 messages, CDA documents, and SOA (and big thanks to Ken for making the long the trip over from Washington for a few days – instead of taking interesting photos).

Australia has a deep investment in V2, with a widely deployed distribution network that distributes v2 reports from diagnostic services, along with a mix of referrals/discharge summaries/letters between GPs. Just about all GP practices and many specialists are hooked up to the network, though the connection between the network and the tertiary referral hospitals is a bit spotty. It’s not really a network in the classic sense though, distribution is performed by private carrier companies that charge for their services (though the services they offer – clinical desktop support, administration etc are not cheap to provide).

HL7 v2 is deeply enmeshed in this world – and now we are talking about introducing CDA documents. Why? Well, for two reasons:

  • CDA is actually a much better fit to the problem than HL7 v2. I’ve written about this before, but since I wrote that, it’s become clear that we’ve taken HL7 v2 as far as it can go. I’ll make another post about this later
  • The pcEHR is just over the horizon, and CDA is a perfect fit to it, and also the express choice of the pcEHR architecture

My subject for today was how to transition to providing CDA documents for diagnostic reports in this scenario. The first problem is distribution – NEHTA has developed the specifications to support true network distribution that is self-administering in the way the internet is – secure message transport, with public identifer and certificate registries. But while the infrastructure for this is in place or nearly in place, a transition to this infrastructure is costly (mainly in administration costs), and even thought the end state will be cheaper and better, there’s no enthusiasm for making this transition yet. And we don’t need to – we already have a working distribution mechanism for CDA: putting them in HL7 v2 messages. There’s some open questions about exactly how that works – I’ll take that up in a separate post. But we know how to transmit CDA documents – in v2 messages.

So how do you generate CDA documents? The best way to do it is to generate them at the source, where the HL7 v2 message is generated. But I’m hearing the message loud and clear that due to the prevailing business operating conditions here in Australia, this is just not feasible (more for business reasons than technical reasons). So people are left trying to generate CDA documents from the v2 messages downstream. That’s actually harder in a technical sense, I think.

The rest of this discussion is very specifically about diagnostic reports in HL7 v2 messages following the AS 4700.2 standard. Not electronic ordering – I wouldn’t use CDA for that, but for diagnostic reports – it’s the way to go. And the AS 4700.2 standard lays down enough rules to make it possible to convert from v2 to CDA, and to do that in an off-the-shelf tool. When writing such a tool, there’s several problems to deal with:

  • Multiple documents per message?
  • Filling out missing identifiers & details
  • Ignoring things in v2 not represented in CDA
  • Converting Data types / Data Quality issues
  • Constructing Narrative (& format conversions)
  • Attachments – images and digital signatures

People are pretty interested in these subjects, so I’m going to make a series of posts describing these issues. Though I’ll be writing specifically about AS4700.2, much of what I have to say will be useful in a wider context.

But the some of this is that it’s hard – hard enough to be too much for a challenge for most programmers. Well, a solution is at hand.

Free V2 to CDA Conversion Tool

From today, I am giving away a free v2 to CDA conversion tools. It will convert from conformant AS4700.2 message to a CDA document that is conformant with the pcEHR specifications. It’s available both as a testing GUI, and also the engine will be provided in various forms that make it suitable for inclusion into production code (web services, file transfer, TCP/IP, COM object, others). It will continue to be free for use with AS 4700.2 messages, and I’ll provide free support to fix issues related to conformant messages. And the code is proven reliable – it’s based on the code libraries from HL7Connect.

You can get further details and download it from The CDA Tools page.

The tool itself is currently in beta. The CDA document it produces is still subject to change (the specification is only partially written so far). But the infrastructure (i.e. APIs) will be stable.

Note that a couple of people asked why I’m giving it away. The answer is that I’m being paid by NEHTA to write these specs, and I didn’t see how it was reasonable to charge for a tool that implements specs I get paid to write. Also, I’m frustrated with the slow adoption of CDA, and the pathology industry should be doing better  – this is my small contribution to bringing things forward.


Because everybody hates OIDs

In a recent post, John Halamka says:

For patient ID, we considered many options but selected a very simple XML construct based on a streamlined CDA R2 header.  This XML has nothing healthcare specific such as OIDs in it.

John H, like many other implementors, doesn’t like OIDs. John Moehrke took him to task:

OIDs are not healthcare specific and should not be frightening. Yes, they should never be shown to a human

John M’s response made me laugh – I regularly say that things about HL7 shouldn’t be shown to normal humans, and that’s a category I clearly don’t belong in. But shouldn’t be shown to any human? John M is right though – OIDs aren’t healthcare specific – OIDs is a standard defined jointly by ITU-T and ISO (source), and beloved of big system administrators. But HL7 is a pretty enthusiastic adopter (hence the mention on the wikipedia page here). However what John H is alluding to is that people don’t like OIDs. It’s a fact that I hear on a regular basis.

It’s not that people don’t understand what OIDs achieve (globally unique meaningless identifiers), it’s just that they don’t see that as an outcome that makes it worth putting up with OIDs for. And putting with OIDs is.. tiresome. For instance, here’s a few OIDs:

  • 2.16.840.1.113883.6.1
  • 2.16.840.1.113883.12
  • 2.16.840.1.113883.6.96

One of these OIDs only means something to me, but the other three OIDs are probably the most commonly used HL7 OIDs in the world – but I reckon that very few people instinctively recognised them (the OIDs for LOINC, Snomed-CT, and HL7 v2 tables). (Obviously the DICOM OIDs for abstract and transfer syntax are the most widely used of all).

Why do we do make people use OIDs? Why, for these standard commonly used concepts? Is that we want to make it hard to do v3? Is there some value proposition for making people use OIDs for these fixed concepts? The thing is, it’s not like we have to make them use OIDs, that we have no choice. Where the base identifier type is defined, we define three types: UUID (like so: 8E485717-26F0-4F15-A1A8-DA163032EB7E), OID, and RUID, about which the standard says:

HL7 also reserves the right to assign RUIDs, such as mnemonic identifiers for code systems

An RUID is defined as:

A globally unique string defined exclusively by HL7. Identifiers in this scheme are only defined by balloted HL7 specifications. Local communities or systems must never use such reserved identifiers based on bilateral negotiations.

And HL7 has never actually gone and defined one of these. Why not? Thinking back on the few times we have talked about it, there’s two reasons:

  • We want people to get used to OIDs
  • We’re concerned about spending committee time on defining RUIDs.

Well, I think it’s high time we defined some, and let people use easy to use strings like “snomed-ct”, “loinc”, “v3-model”, “us.ssn” etc. It would make instances so much easier to work with. Anything we can do to make the instances easier to work with is not only a good thing, it’s a necessary thing (starting to become urgently necessary).

One of the intriguing aspects of defining RUIDs is that they are retrospectively active. In other words, if HL7 defines an RUID today, it can be used in any existing v3 instance, in particular, any CDA documents.That’s kind of cool – and also a bit dangerous for all the existing implementations out there that have almost certainly not added RUID support even though it’s part of the specification. So I think that if we went ahead and defined a set of RUIDs, we’d have to say that they can only be adopted by trading partner agreement.

Of course, defining RUIDs for the commonly used OIDs isn’t going to get rid of them completely. Nor is it going to make people completely happy. But we have to do something.

The other thing  that a set of carefully chosen RUIDs would do is re-introduce meaning to the identifiers. I’ve recently been spending a lot of time looking at converting v2 messages to CDA documents, and CDA badly needs meaningful identifiers. Defining RUIDs and letting the affiliates ballot their own RUIDs would go along to solving this problem. (of course, letting affiliates ballot and use their own RUIDs would make it impossible to share documents across different affiliates – but why worry about the closing the door after the horse has long bolted?)


ISO 21090: Underlying design propositions #2

This is a follow up for ISO 21090: Underlying design propositions #1


The underlying notional data type definitions on which ISO 21090 is based (the HL7 v3 Abstract Data Types) include several mix-ins, specifically HXIT, URG, and EXPR. Mixins are a strange type: a generic class that extends the parameter class, instead of expressing properties of the type of the parameter class. That’s a hard idea to get your head around; it’s much easier to understand when you express it like this:

mixin<T> means to use T, and also to use whatever features mixin has as well

Using mixins like this leads to lovely clean designs. But very few languages can actually implement them. (Eiffel can. Ruby supposedly can. Ada supposedly can). But none of the mainstream implementation technologies can (and I think that’s not merely coincidence). If the architectural design uses mixins, there’s three different bad choices for how mainstream implementations can make them work:

  1. Create a wrapping type Mixin<T: X> where it has a property base : T
  2. Create a whole slew of types Mixin_T – bascially pre-coordinate the Mixin type by hand
  3. Push the attributes of Mixin up to X (the base type for T)

Mixins are also discussed here (in a little bit more detail).

ISO 21090 had a base value proposition: that implementers can take the UML model or the schema, and implement them as is with no fancy shenanigans. So we had to make a choice – one of the three above. We chose option #3 for most of the mixins, and pushed their properties up to X, which in practice means that ANY has the HXIT properties, and QTY has the URG and EXPR properties. This, after all, mapped directly to the actual implementations that existed for the v3 data types at the start of the project (at least, those that people would own up to).  Btw, it works too – my data type implementation is code generated from the UML (manually, by keyboard macros, but still generated). Other implementations are code generated from the UML or schema too.

But option #3 doesn’t lead to good theoretical definitions. The mixin derived properties are everywhere, so that they could be used if it was appropriate, but mostly they aren’t relevant or allowed to be used.

To my fascination, the choice to have a practical implementation instead of a theoretically better specification that requires much mucking around to implement particularly angered two VIPs in the healthcare data type world. I understand that, but tough. Again and again we looked at it in committee and chose to have a spec that worked for most implementors.

Design by Condensation

This is a bit harder to explain, but it’s an important plank of the way the data types are defined. It’s a design pattern I’ve never seen discussed anywhere else (though it probably is – there’s a myriad of design patterns out there). (btw. it wasn’t me who invented this one, and it took me many years to come to feel the weight of it’s advantages, where as it’s disadvantages are quickly obvious).

Lets take the measurement related notions as an example. The basic notion is a value with a unit. Sometimes, you might want to track how the data was estimated. Sometimes, you might want to just use some human text instead of or as well as the measurement. (it might say “About 6 foot”, for instance).

Here’s a simple standard O-O model where this is done by adding features in specialist classes.

The problem with this is that while it’s evident what this means, it’s not at all evident how this actually works in practice – it’s just fine and dandy when you have a simple measurement, but when do you have to check whether it’s a particular subtype? And there’s also a permutational explosion of classes as different aspects compete to pollute the specialisation tree. In a moderately complicated O-O system, this starts to become pretty difficult to manage (as consequently, there’s all sorts of lore, design patterns etc about how to manage this. The specific model above might or might not be a good one by these patterns, and I don’t want to argue about the specific model in the comments please).

Instead of inventing ever more elegant and difficult to comply with rules about class heirarchies, we could abandon specialisation altogether, and do the design by composition:


Designing models by composition like this is growing in popularity. It’s just pushing the same complexity around to different places, but it’s less…. ethereal. Large compositional models aren’t hard to understand, just hard to navigate. And they’re still dominated by lots of type testing and casting and various kinds of switch statements.

Instead of either of these two approaches, we could define as few classes as we can get away with:

This is what ISO 21090 does (btw, this UML model includes heaps more features than the other ones above). I call it “design by condensation” since we condense everything (it’s just a name I made up, I’m not particularly attached to it).This design pattern has the following features:

  • Fewer types (a lot fewer)
  • The types themselves are not simple
  • But all the logic is done up front
  • your investment will pay off with much easier leverage in the long run (there’s that up-front investment again)
  • The classes do have more “dangling appendages” – properties that won’t get used very often, and perhaps not at all by particular implementations (See Tom Beale on “FOPP”)

That’s how ISO 21090 works. Is it the right way? I don’t know, but it does have real advantages. But the complexity hasn’t gone away, because you can only move it around (law #2).


These are the big four design propositions of ISO 21090. I think that understanding these will significantly help people who have to “implement” ISO 21090 in their systems. I put “implement” in quotes because it just means so many different things in different contexts. Which is one reason why ISO 21090 is “for exchange”, not for systems. The problem for system implementors is that this knowledge has never been written down anywhere (until now, now that it’s here), and even knowing this stuff still raises a “so what?”: how would you actually implement ISO 21090 in a system then?

Tom Beale and I are going to co-author a system implementation guide for ISO 12090. It’ll include a different simpler model for the types, and describe how to handle other features, as well as discussing issues like how much to normalise the concept data types. It’s a long time ago we agreed to do this, and I was supposed to do something about it – it will happen one day.


ISO 21090: Underlying design propositions #1

In a previous post (see here), I promised to talk about the underlying design issues that are implicit in ISO 21090. ISO 21090 has attracted some strenuous criticism because of it’s underlying design characteristics. The primary critic is Tom Beale, though he’s not the only one.


The first design characteristic of ISO 21090 is that there is a huge requirements gathering network that leads to ISO 21090. It’s very comprehensive. A lot of people struggle with this – “But it’s sooo complex”. And it is. This is a value proposition, that if you invest in a solid implementation of the data types library, you’ll be able to reuse it again and again. This is a very evident aspect of ISO 21090: it was designed to be something you had to invest in to use – a heavy weight standard. If you run into ISO 21090 on the fly from some wider perspective, and have to implement a little bit of it, then it’s not going to make you happy; the density of the standard (particularly the ‘design by condensation’ discussed in the next post) is mostly going to be painful, and it’s comprehensiveness, along with the value that can be leveraged from a solid implementation – that’s going to be irrelevant to you.

So a lot of criticism of ISO 21090 is driven by this. It’s not usually well expressed, but does that make it invalid? Well, it comes back to the value proposition: which is better? For invested healthcare developers, particularly healthcare enterprise information system developers, who work slowly and thoroughly, the ISO 21090 value proposition is a winner.

Worst-Case Interoperability

ISO 21090 starts with the following words:

This International Standard provides a set of datatype definitions for representing and exchanging basic concepts that are commonly
encountered  in  healthcare  environments  in  support  of  information  exchange  in  the  healthcare environment.

Also, it says:

This  International  Standard  can  offer  a  practical  and  useful  contribution  to  the  internal  design  of  health
information systems but  it is primarily  intended to be  used  when defining external interfaces or messages to
support communication between them

It’s become evident to me that we didn’t say enough about this, about how the for-exchange aspects of the design are not the same as how you design things for a system. One of the underlying presumptions of ISO 21090 is Worst Case Interoperability. The premise is that the systems exchanging data don’t share anything other than the data being exchanged right here and now. ISO 21090 is designed for that, and that’s what I meant when I said “for exchange”. In committee discussions, Dipak Kalra picked up that there was an issue and extended the language, but we didn’t go far enough.

The problem is that “for exchange” can mean for use between best mates Peter and George who share nearly everything, as between Alice and Bob, who can barely be civil to each other (see Worst Case Interoperability). ISO 21090 is built to allow Alice and Bob to work together, but at the cost of making things well designed for Peter and George. They can still make it work, but they pay a tax for that by the presence of things that are in the wrong place because they’d rather normalise them out of every exchange and put them where they belong.

A good example of this is the notion of the audit trail attributes built into the HXIT type. In a properly designed system, you don’t exchange references to past history of things in well designed systems; in a well designed system you’d have an audit trail (probably a 3rd party one, maybe based on IHE ATNA). But we didn’t make ISO 21090 for properly designed systems like that. We made it for systems that don’t need to share audit trails.

I think that this is the heart of Tom Beale’s criticisms of ISO 21090. He pretty much claims that worst case interoperability doesn’t work. And though I know it “works” (for a given value of work), it sure ain’t pretty. It may be that it won’t scale, and it certainly won’t scale as well as it might’ve if we designed it for a cleaner architecture (a la openEHR). But we designed it to work in the worst case. I wish now that we had extended the wording in the introduction to make this clear, because some enterprises are pushing system designers to use ISO 21090 directly inside their systems. It should be clear that I don’t think this is a good idea – ISO 21090 lives on the perimeter; I’d never have my internal application objects be actual ISO 21090 types – though they’d be based on it, and indirectly conformant.

This is part #1 of a series. – I’ve run out of time for now. Part #2 covers Normalisation of Mix-Ins, and Design by Condensation, and describe a System Implementation Guide for ISO 21090 that Tom and I propose to write.

We need a standard web / sms gateway protocol

A quick internet search (here: try this) shows that there’s hundreds of web <-> SMS gateways out there. But they are all using their own protocols, so far as I can tell – there’s no standard protocol.

That’s just stupid – more demonstration that interoperability in telecommunications isn’t as good as people keep telling me it is.

In the meantime, customers pay for integration with a service that should just be an outright replaceable commodity. Hopefully it will happen soon.

Version 2 and character sets and encoding

I’ve been rewriting my v2 parser and trying to make it fully conformant to the v2 specification with regard to character sets. It’s a tough problem. There’s several parts of the problem that make it tough.

Finding the character set/encoding

The character set is embedded many bytes into the message content at MSH-18. So you need to read the first 40-100 bytes or so into characters before you know how to turn them into characters…. sounds like fun. Actually, it’s pretty much a managable problem, because there’s no need to use characters with value >127 before MSH-18 (note that there’s no need, but it’s possible to use them). Given that the message starts with ‘MSH’, you can tell by inspecting the first 6 bytes whether you have single or double encoding, and if it’s double byte encoding, what the endianness is. Note that you can also tell that from a byte order mark (BOM) if there is one. Given this, and if the sender didn’t send any characters >127 while using UTF-8, then you can reliably find and read MSH-18. Once I’ve read that, then I reset the parser and start again with the specified encoding.

Of course, it’s always possible that the character set as specified by the BOM or made clear by inspecting the first 6 bytes differs from what is implied by the value of MSH-18… I ignore MSH-18 if it doesn’t match.

Note that v2 doesn’t say anything about the BOM – I think it should in a future version.

Understanding the character set/encoding

The second part of the problem is that MSH-18 is sometimes character set and sometimes character encoding (see here for discussion) – the values are an unholy mix of the two. In addition, the list of values matches precisely the list of values in DICOM, and as far as I can tell, no other list at all. Here’s a list of the possible values for MSH-18 (v2.6):

  • ASCII  – The printable 7-bit ASCII character set.
  • 8859/1  -The printable characters from the ISO 8859/1 Character set
  • 8859/2 – The printable characters from the ISO 8859/2 Character set
  • 8859/3 – The printable characters from the ISO 8859/3 Character set
  • 8859/4 – The printable characters from the ISO 8859/4 Character set
  • 8859/5 – The printable characters from the ISO 8859/5 Character set
  • 8859/6 – The printable characters from the ISO 8859/6 Character set
  • 8859/7 – The printable characters from the ISO 8859/7 Character set
  • 8859/8 – The printable characters from the ISO 8859/8  Character set
  • 8859/9 – The printable characters from the ISO 8859/9 Character set
  • 8859/15  The printable characters from the ISO 8859/15 (Latin-15)
  • ISO IR14 – Code for Information Exchange (one byte)(JIS X 0201-1976).
  • ISO IR87 – Code for the Japanese Graphic Character set for information interchange (JIS X 0208-1990)
  • ISO IR159 – Code of the supplementary Japanese Graphic Character set for information interchange (JIS X 0212-1990).
  • GB 18030-2000 – Code for Chinese Character Set (GB 18030- 2000)
  • KS X 1001 – Code for Korean Character Set (KS X 1001)
  • CNS 11643-1992 – Code for Taiwanese Character Set (CNS 11643-1992)
  • BIG-5 – Code for Taiwanese Character Set (BIG-5)
  • UNICODE – The world wide character standard fromISO/IEC 10646-1-19935
  • UNICODE UTF-8 – UCS Transformation Format, 8-bit format
  • UNICODE UTF-16  UCS Transformation Format, 16-bit format
  • UNICODE UTF-32 – UCS Transformation Format, 32-bit format

That’s a fun list. The default is ASCII, btw. Now I’m not going to write my own general character encoding engine – who is? I’m going to use the inbuilt functions in windows to convert everything to unicode. That means I have to map these values to windows code pages to pass to the windows conversion routines. But it’s a problem, mapping between these values and the windows code page values. Here’s my mapping list.

  • ASCII  = 20127 or 437
  • 8859/1 = 28591 : ISO 8859 : Latin Alphabet 1
  • 8859/2 = 28592 : ISO 8859 : Latin Alphabet 2)
  • 8859/3 =28593 : ISO 8859 : Latin Alphabet 3
  • 8859/4 =28594 : ISO 8859 : Latin Alphabet 4)
  • 8859/5 =28595 : ISO 8859 : Cyrillic
  • 8859/6 =28596 : ISO 8859 : Arabic)
  • 8859/7 =28697 : ISO 8859 : Greek
  • 8859/8 =28598 : ISO 8859 : Hebrew
  • 8859/9 = 28599 : ISO 8859-9 Turkish
  • 8859/15 = 28605 : ISO 8859-15 Latin 9
  • ISO IR14  = ??
  • ISO IR87 = ??
  • ISO IR159  = ??
  • GB 18030-2000  = 54936 : GB18030 Simplified Chinese (4 byte); Chinese Simplified (GB18030)
  • KS X 1001 = ??
  • CNS 11643-1992 = ??
  • BIG-5 = 950, ANSI/OEM Traditional Chinese (Taiwan; Hong Kong SAR, PRC); Chinese Traditional (Big5)

As you can see, it’s incomplete. I just don’t know enough to map between the HL7/DICOM codes and the windows code pages. Searching on the internet didn’t quickly resolve them either. All the links I found pointed to either HL7 or dicom standards, or copies thereof.

If you know what the mappings are, please let me know, and I’ll update the list.

The character set can change

If that’s not enough, the character set is allowed to change mid-message. There’s a couple of escape sequences (\C..\ and \M….\) that allow the stream to switch characters mid-stream. This makes for a slow parser because of the way windows does character conversion – you can’t ask for x number of characters to be read off the stream, but for x number of bytes to be read into characters (how do you tell how many bytes were actually read – convert the characters back to bytes – I suspect that this isn’t deterministic, and there’s some valid unicode sequences that some windows applications will fail to read, but I don’t know how to test that). So you have to keep reading a byte or two at a time until you get a character back, because you can’t get an encoder to read ahead on the stream – you might have to switch encoders.

Having said that, I’ve never seen these escape sequences change in the wild, and it seems like a sensationally dumb idea to me (however, I’ll make a post about unicode and the Japanese in the future).

If I have any Japanese readers, how does character encoding in v2 actually work in Japan?

Mostly, implementers get this wrong

This stuff is sufficiently poorly understood that most implementers assume their working in ANSI,use characters from their local code page, put them in and claim they’re using something else. The windows character conversion routines fail in some of these cases. I don’t know what to do about that.

There. That’s enough. We really really need to retire v2. It’s time has passed.