Monthly Archives: June 2011

Not getting the Concept

This morning I wake up to a new post from Barry Smith at HL7-Watch entitled “HL7 attempts to get things clear about its own use of the word ‘concept’“, in which he criticises HL7 for it’s definition of Concept as found in the V3 Core Principles.

There’s a lot to criticise in the definition of concept – it’s a tough thing to define. But Barry’s criticisms are not well informed. In fact, they look like he didn’t want to engage, but only to lampoon.I’m not going to make a point by point refutation, but I will note the following things:

  • The point of the terminology work is to impose standards on the way people think. As a goal/method, this has obvious problems, but that’s exactly what a grammar does
  • When you provide 3 aspects to a definition for something, that would be because they are all different, and all need to be true
  • The difference between “real” and “abstract” is a false dichotomy in language, which is all symbolic
  • It is hard to differentiate between use and mention – Barry demonstrates that several times in his own comments

But the Core Principles document (which I sometimes edit, though not this section) here has a tough gig: it’s mission is to describe what we already do, which in this context, is use defined terms. It’s actually a ubiquitious practice in all IT – when a programmer knocks up an enumeration, they’re doing it too. It’s a smooth continuum up to the nightmare of working with post-coordinated Snomed-CT.

If Barry really wanted to help people out, he would offer a better way to describe of what people are already doing. But no….


What makes healthcare special?

An insightful and blunt take-down of Google from Mr HISTalk:

Google predictably did what its know-it-all technology company predecessors have done over the years: dipped an arrogant and half-assed toe into the health IT waters; roused a loud rabble of shrieking fanboy bloggers and reporters (many of them as light on healthcare IT experience as Google) who instantly declared it to be the Second Coming that would make all decades-old boring vendors instantly obsolete or subservient to the Googleplex; and then turned tailed and slunk off at the first sign of lackluster ROI, leaving the few patients and providers who actually cared high and dry except for those same old boring vendors who have stuck it out for decades instead of chasing whatever sector looked juicy at the moment.

I skeptical that Microsoft will last it out too. Sooner or later, Microsoft is going to start ringing in the changes to respond to the ongoing success from Apple and Google at sucking the profit out of their mainstream business. How can they continue to justify spending money on stuff like HealthVault then?

It’s not just vendors, btw, who act like this. Again and again I’ve met “experts” who’ve come out of banking or telecommunications, looked at healthcare, and said “You guys have no idea what you’re doing. We’ll just solve your problems like we did in [x]”. Give them 9 months, and they’ll be gone, because while I’m never sure whether we know what we’re doing, I’m sure that they don’t know what we’re doing when I hear something like that.

Healthcare long ago integrated the stuff that is easy. So long ago that the two mainstream standards – HL7 v2 and DICOM – are seriously showing their age. But they’ve also got incredible momentum too. What we’re trying to do now is hard – both technically and socially.

Reader vs Writer

There’s a a trade-off between reader and writer in most interoperability specifications, though it’s not always widely appreciated. What suits the reader is sometimes the exact opposite of what suits the writer.

As an example, let’s consider exchanging a patient address. Generally speaking, in English language countries, an address has a two lines of text, a city, a postcode (usually), a country, and sometimes a state or province. Around the world, some countries do not have post codes, some have states or provinces, and the amount of structure in the other lines and the usual order of the parts varies wildly. There are a variety of approaches to handling addresses across different systems. Some systems simply treat the address as four lines of text; many systems pull out the city, state and postcode separately in order to either validate them or allow their use for statistical analysis. Usually such systems have to create a field for country too, though this does not always happen. Recently the postal services have started focusing on improved data handling to support high-volume automated solutions for postal delivery, and different countries have created different structured definitions for the street and delivery details . Given the amount of mail that healthcare organizations send, there has been strong interest in adopting these solutions.

Now, consider the general concept of defining a format so that a variety of systems can exchange addresses with each other. The first question is, what are we trying to achieve? Briefly, there are five main reasons for exchanging addresses:

  1. To allow the use of address to assist with identifying persons
  2. So physical mail can be sent to the person/organization associated with the address
  3. So physical mail can be sent to the person/organization associated with the address using high volume mailing methods
  4. To allow a correct billing address to be used as part of a financial exchange
  5. To allow for address based data analysis (usually be post code, but there are public-health monitoring applications that do more fine-grained analysis – though GPS coordinates are more useful for something like this, they are mostly not available)

Here are a few example choices for how we could represent an address:

  1. As plain text with line breaks Suitable for uses #1 – #2, maybe #4 (depending on the the financial standard)
  2. 2 lines of text, city, state, post code and country Suitable for uses #1#2, #4 and #5
  3. A data element for each known part. Suitable for all uses
  4. A series of parts with a type code. Suitable for all uses

Each of these approaches has advantages and disadvantages for the different use cases. but this is not the focus of this post. Now, imagine that we have three different systems:

  • System A Stores an address as 4 lines of plain text
  • System B Stores an address as 2 lines of text, with city, state and postcode
  • System C Stores an address as a set of data elements

This table summarizes how easy it is for each system to be the reader and writer for each structure:

  Structure #1 Structure #2 Structure #3 Structure #4
System A Easy: Straight in Very hard: try to parse city/state/code Too hard? Too hard?
System B Easy. Convert to Text Easy. Straight In Very hard: try to parse lines of text? Very hard: try to parse lines of text?
System C Easy. Convert to Text Easy. Convert parts to text Easy – just fill data elements out (as long as parts match) Easy – just create parts
System A Easy. Straight in Easy. Convert to text Easy. Convert to text Easy. Convert to text
System B Very hard: try to parse city/state/code? Easy. Straight In Easy. Read parts and build first two lines Easy. Read parts and build first two lines
System C Too hard Very hard: try to parse lines of text? Read Elements into parts Read Elements into parts (lose order)

The effects of entropy are clearly visible in this table: introducing structure is far harder than removing it. A more structured model is easier to read and harder to write. Likewise, more choices about how things are written makes it easier to write and harder to read. The key point of this table is that the different systems have different trade-offs for whether reading or writing addresses is easy.

If you sit representatives of these different systems down to hammer out a format they’re going to use to exchange data, they’re each going to want something different; who wins will be as much about personalities and influence as is it about the right choice (see 1st and 2nd laws of interoperability).


If your Healthcare IT standard was a River Boat…

I spotted this: If your programming language was a boat, and it got me thinking… what if your Healthcare IT standard was a River Boat? (A river boat is a hat tip to the old, but wrong, bridge metaphor for interoperability).

HL7 v2

HL7 v2 is like a river raft with a rope guideline. Anyone can build one, they’re all slightly different, and they only go where the rope takes them.


openEHR is a like a house boat – solid design from first principles, unsinkable, slowly coming along. All aboard to join the community party!

HL7 v3

What an amazing concept! Power like you wouldn’t believe. Once in flight, can carry an entire hospital across the river in milliseconds. Sadly, takes 100x the width of the river to get going.

(More info:


Yep, you can do almost anything in XML. It just requires a great deal of hard work, with almost no help from your machinery


We’ll just bung a few different parts together – it’ll be fine; they’re all proper standards in their own right. There’s no risk that it will go bang later.


This boat is made up of a whole lot of little bits that each float in their own right. Nothing is ever going to sink this boat. But where’s all the bits to make it work?

Fresh Look Taskforce Follow Up #1

One of the things that people say about HL7 is that it is no longer producing specifications that are useful for implementers, that give implementers what they want.

This message was certainly given a thorough airing at the RIMBAA Q6 Wednesday meeting in Orlando recently. The Wed Q6 meeting is a odd meeting – it starts during the cocktail party; people have to give up a drink and go sit listen to presentations. Attendance is a bit variable. This time there was maybe 40 people there – an attendance record by a long shot (I don’t think I’ve missed any)

RIMBAA itself is an odd duck – a group of people who share an interest in using the RIM as the object model at the heart of their application. I for one have a hard time getting my head around using a denormalised half ontological model designed for interoperability as the basis for a logical persistence store – these are very different goals and compromises, but I can’t argue with their enthusiasm, nor the fact that they are building real world, useful clinical applications. So they’re certainly a bunch of fans of the RIM (a category of people who sometimes seem to be in short supply).

I was asked to be in attendance, along with Lloyd Mckenzie, to talk about PIIMs, an idea that I came up with in the first place, with Lloyd (though Lloyd named them!) and that was raised during the spirited discussion on the Fresh Look Taskforce on the RIMBAA email list. But the RIMBAA Q6 meeting was advertised as a meeting about the Fresh Look Task Force, and many people came along to talk about that, not to listen to Lloyd and I drone on about yet another weird modeling acronym.

PIIM, btw, stands for Platform Independent Information Model. It’s not unrelated to the point of this post, so I’m going to explain it. There are two main ways that HL7 implementable models (i.e. models that are ready to be be part of messages, documents, or services) are described: as “RMIM” diagrams, and as an XML schema. The problem with this is that RMIM diagrams are not standard UML (a claim that I will define and defend in another post), and the schema, as well as not having any formal standing (another subject I’ll take up in separate post), they’re, well, schema. They might describe XML but they’re not any way to develop software (though they make some people happy).

A PIIM is a proper UML model, that describes the same instance as the XML schema, only at a computable level, not a technology level (i.e. PIM not PSM). It’s not that RMIM diagrams aren’t useful, but that they’re several transforms away from being concrete. PIIMS are standard UML (no fancy stereotypes, no hidden properties, no abstract data types) and therefore concrete. We could produce PIIMs for all of our v3 models – it’s just a question of resourcing it, and from my point of view, the discussion was to find out whether people thought that was worth doing, whether it would fill a hole in what HL7 delivers to people.

The answer appeared to be, “just a little” – some people are excited by them, but for most people, not having a UML definition of the exchange specifications is not the problem, and HL7 making and publishing PIIMs wouldn’t make any difference to whether v3 would meet any goals or not. That makes sense to me, btw. UML is good for conceptual descriptions of object systems, and not so good but widely (ab)used for formal definitions of object systems (a la MDA) but there’s no accepted standard canonical format for exchanging object instances between any systems defined by a common UML model (not that there is even such a thing as a common UML model across the UML tooling stacks).

After we talked about PIIMs, the discussion turned to the wider issues of the Fresh Look Task Force. People were pretty interested in that, because the Fresh Look Task Force is somewhat opaque (as of June 14 2011, there has been some notes released to the participants, but no formal public report yet, which a lot of people are disappointed by).

The outcome of the discussion was that we set up a wiki page ( where everyone in the community can make comment about what they’d like the fresh look task force to achieve, to document problems, and propose courses of action. I think everyone should go and take a look and contribute. HL7 is first and foremost a community, and so we’d like to know what the community thinks.

This post is the first of a series where I’m going to pick an issue I see in (or from) that page, and seek further focused contributions.

What Implementers Want.

The qestion I’m going to pick is, what do implementers want? What specification could we publish that the implementors – the people who choose and use specifications – really want. That would make them go, “yeah, that’s the thing I want”.

What, then? It seems pretty obvious to me that what v3 is isn’t rocking people’s boat. That message was given a thorough airing at the RIMBAA Q6 meeting.

But what do they want?

One stakeholder spoke of the challenge of working with implementers that don’t know the meaning of the word “serialisation”. It’s hard for me to know what to make of that. Should we not use the correct terms because some implementers don’t know them? If we didn’t, other implementers would not be happy at all (me!).

I have some pretty solid ideas that I’m working on for the fresh look taskforce. But before I start rolling these out for debate, what are the criteria by which a good specification can be measured? Comments welcome here or particularly in my Desiderata for a good specification on the HL7 wiki.

p.s. Implementer or Implementor? (beats me)

Useability vs reusability

It’s an issue that runs through IT – the payoff between use and reuse. See, for instance, this paper (and also this). The problem with reusing code is that it creates multiple dependencies on a single piece of code, and changes for one functionality can break another that’s completely unrelated – and, more importantly, untested (cause tests never cover every scenario the users are going to come up with).

It applies to standards too, but the problem manifests differently. I’m looking at a specification – doesn’t matter which one – that defined a common interaction model for “entities” that are involved in a particular healthcare interaction. Each entity has:

  • Identifier
  • Name
  • Address
  • Electronic Communication

It’s nice, having such a simple consistent pattern – an entity library that gets reused all through the specification. The definition for address is:

“The description of a location where an entity (person or organization) is located or can be otherwise reached or found”.

That’s not a bad definition for a general address.

The downside with reusing “entity” like this is that the definition makes sense in the general case, but gives you absolutely no idea what is intended in some case. Take, for example, a healthcare transaction carried out in a an institution by an organisation, where the person who did the data entry is a required field, and as an entity, the person has an address defined as:

“The description of a location where an entity (person or organization) is located or can be otherwise reached or found”.

Right. So, ahh, is this just because they ‘inherited’ this address as an entity, and it’s entirely useless in this case? Or is it because the person doing the data entry might be in India doing transcription? Is there a specific desire to be able to locate the person?With such a generic definition, it’s hard to say.

Reusing generic definitions is good – but they have to be contextualised.

I think one of the problems with HL7 v3 is that we model so much, and don’t take the time to contextualise the definitions. Openehr also has this problem too – see this page, which provides context specific definitions for some reference model attributes for some uses. These establish some patterns but you’ll see archetypes out there designed without that knowledge (I own one or two myself).


Do ADT^A11 msgs cancel/discharge all messages for an acct?

Acct = account? And account = admission. I presume. On that basis, the definition of A11 is :

For “admitted” patients, the A11 event is sent when an A01 (admit/visit notification) event is cancelled, either because of an erroneous entry of the A01 event, or because of a decision not to admit the patient after all

Well, the A11 is for either where it was an erroneous entry, or where it was a real entry that was cancelled due to real world events.

In the first case, it’s gone, and will never be heard from again. In the second case.. maybe. This is where it depends on the way the implementer of the sending system reads the spec, and their problem.

I think that having sent an A11, that should be it. No more messages on the episode, and a new one created.

But what if the A11 itself is sent erroneously? There’s no cancel a cancel message, so it’s unpredictable whether a system will use some other event to magically bring it back to life, or go with a new admission. And since there’s no real clear definition of how to use PV1-19 (Visit Number), it may not even be clear whether a new episode with the same episode details is a different episode or not.

Repeating String Fields

It’s a design pattern that I come across fairly often across all sorts of modeling paradigms. It’s a simple one too: a field/property/whatever that is a list of strings of no fixed content.

In XML schema, it looks like this:
<xsd:element name=”comments” type=”xsd:string” maxOccurs=”unbounded”/>

In UML, it looks like this:

I also come across it in HL7 v2 specifications, openEHR archetypes, database schemas, application object models, etc.

I’m not talk about fields where the string is a something, where something is a list of characters with a known meaning (code, identifier, etc) – that makes sense.

No, just a plain old list of strings. That do… something. What? The name is suggestive. So let’s stick with the name “comments” (some readers may know exactly what I’m referring to here). What does it means to a have a list of comments. Is

“The result is unexpected. I called you about it”

meaningfully different to

“The result is unexpected”, “I called you about it”

in any way? What does it achieve, offering cardinality > 1?

It seems to me that the model is effectively saying, there’s some hidden semantics here that I don’t tell you about, but they’ll be able to leap out and catch you out later. Maybe:

  • the different strings have different sources (and the computer tacking them together is too much work?)
  • the strings have unstated meaning
  • the strings actually have slightly different meaning depending on order
  • later on, some one will have to add more information to the model

All of these are potentially a bad thing. So one of my criteria for a good model is that doesn’t have any strings properties/fields/whatever that can repeat, where it isn’t explicit how the repeats are meaningfully differentiated.