Monthly Archives: March 2012

Good Specifications: Ernest Hermingway vs Leo Tolstoy

How long should a good standard be? Just how many words should it have?

Short Standards

A short standard is a good standard – every word the standard contains is a word that hundreds/thousands of implementers have to read. So keep it short – say only what needs to be said in order to get the standard to work, and don’t add spurious words.Write it like Ernest Hermingway, “known for his terse prose”.

But the problem with that is that there’s no way to know what is the minimum that needs to be said, and inevitably standards don’t say enough. As an implementer, I often read a standard, try and match it to my particular requirements, and wonder just how it’s supposed to work – there seems to be many different ways… if only they’d said more, there’d be less confusion.

Long Standards

An alternative approach is to say as much as you can when writing a standard. To clarify the usage of the standard completely and systematically. Every word the standard contains is a contribution to helping people understand it, and the more descriptive you can be, the more you help the implementers. These standards aspire to a Tolstoyan level: true epics, “thought to be one of the greatest novels ever written”… oh may they say that about our standards…

Except, of course, is that what they say about the long comprehensive standards is:

“x,000 pages? How on earth are we supposed to read that?”

and then the implementers don’t. Of course, when you give them a 50 page standard, they say that’s great, then later they say

“this standard is crap. How are we supposed to know what it means?”

Damned if you do, damned if you don’t.

An implementer in the hand is worth two in the bush?

Let’s say..


that your community had spent years building a standard. It’s really comprehensive, but the cost of adoption is proven a lot higher than anticipated. In fact, than is justified. But in spite of that, a bunch of early adopters have bought into the standard, and gone off an implemented it. But a much larger bunch of potential implementers won’t use it because of the adoption cost. A subgroup of those are stakeholders who are now quite disaffected.

Then there is a proposed change.The change will reduce adoption cost, and potentially increase the size of the adopters group, and increase the size of the potential implementers (yes, increase, because success begets success). But from the point of view of the adopters:

  • This is bad, because as early adopers, they’ve already paid the cost of adoption, and now there’ll be a new cost
  • This is good, because the adopters group will increase, and they’ll derive more benefit
  • This is bad, because more adopters will mean they have less influence over the standard

From the point of view of the disaffected stakeholders, it’s all good.

So whether the change gets approved depends:

  • If the early adopters are smaller than the disaffected parties, it’ll probably get up
  • But the reverse is more likely (because disaffected stakeholders rapidly become non-stakeholders), and then it will depend on whether the benefit of a bigger adopter group outweighs the cost for the early adopters

If the early adopters represent a small set of relatively contained trading partners, so that the benefits of wider adoption are somewhat abstract, then the early adopters will be fiercely opposed to change. Does this ring a bell with anyone?

Anyway, putting aside the early adopters interest, what’s in the interest of the organization? Well, the fact is that the early adopters are committed – this time. Whatever the changes are, paying the price is less than going somewhere else (this is a variation on the theme of all-pay auctions). The early adopters are over the barrel, and the organization has nothing to lose by forcing them to pay again for change. And clearly, getting a bigger implementation group is in the interest of the organization.

First conclusion: It’s a bad thing for a standards organization if the early adopters represent one or a few relatively coherent trading blocks

Actually, of course, from an organisation’s point of view, this is the same dilemma faced by any company – should we value our existing customers, who’ve already paid us, or possible new customers, who still might actually pay us. From a short term view, the latter, of course, but the more you do that, the more your potential future customers will not trust you. How much trust do you have, and what’s it worth?

Beyond that, what’s best for the community? Can the specification be changed to reduce the cost of adoption to clearly less than the justified cost? What are the other alternatives? When does the community itself need to abandon past investments in favor of future ones?

It’d be nice if this kind of decision could be made easily, cleanly, on mathematical of logical grounds (and, in fact, my first draft of this blog post included mathematical formulas). But no, unfortunately, the one factor you can never afford to ignore is the Fog Of War: frankly, no one has any idea what’s going on.

Existing implementers: at least you have them, that you can be nearly sure of. But they might stop you getting more.

Does UML Tooling actually work?

UML tooling is driving me nuts. I’m really looking for something rather simple – at least I think it should be simple. I want to be able define a class model (classes, associations, attributes, data types, cardinalities, and stereotypes and property strings), and be able to share that class model (including it’s visual layout) with other UML modeling tools. And since UML modeling tools are (supposed to be?) used by development teams, I want one that reliably supports a version control system (“reliably” appears to exclude Enterprise Architect). And it should be affordable (free or <$300ish per seat)

Isn’t that reasonable? As far I can see, the answer is that I can’t have what I want.

CIMI at the Crossroads

The Clinical Information Modelling Initiative (CIMI, see here, and here) is

“an international collaboration that is dedicated to providing a common format for detailed specifications for the representation of health information content so that semantically interoperable information may be created and shared in health records, messages and documents”

CIMI is one of a number of efforts that have been started to try and define a common format for such specifications; all the previous efforts (mostly going by the name of DCM, “detailed clinical models”) have gotten bogged down in methodology questions and political games of various sorts, and they’ve failed to produce something that people might actually use.

CIMI shows every sign of following the same trail to the same dead end.

From the beginning, the CIMI initiative sought to produce a different outcome from previous efforts by trying to be agnostic on the tribal and political issues that have bedeviled the previous efforts. In particular:

  • The membership of CIMI included all the significant players in the space, not only some of them
  • The charter always included CIMI providing the capability to express the clinical models in a series of different formalisms (i.e. XML, Java, HL7 v2, EN13606, CDA, openEHR etc) by the provision of some “compiler”

The membership point was really new – and for the first time there was real hope that something might come from this. The first task for CIMI was to choose an internal methodology that would be used as the primary expression of the models. The initiative held a meeting in London in Nov 2011 to choose between the following candidate approaches:

  • UML/OCL and associated OMG standards
  • 13606-2/ADL 1.4
  • ADL 1.5 (
  • Semantic Web technology (OWL, RDF, Protégé, and associated tools and standards)
  •  HL7 v3 approach (MIF, HL7 RIM, static models and associated artifacts and tools)

In spite of the fact that these things are at not all alike, a comparison was performed, and the group decided… well, let’s quote from the press release:

  • ADL 1.5 will be the initial formalism for representing clinical models in the repository.
  • CIMI will use the openEHR constraint model (Archetype Object Model:AOM).
  • A set of UML stereotypes, XMI specifications and transformations will be concurrently developed using UML 2.0 and OCL as the constraint language.
  • A Work Plan for how the AOM and target reference models will be maintained and updated will be developed and approved by the end of January 2012.

In other words, the group chose AOM/ADL, but it seems to me it was unable to get full consensus, hence the mention of UML/OCL. Note that the exact relationship between ADL 1.5 and UML is not spelled out.

Well, January 2012 has passed, and there is no work plan – because there still doesn’t seem to be any consensus about the methodology, let alone the reference model. As far as I can tell, the participants who favour UML/OCL have continued on as if ADL/AOM wasn’t the initial formalism. The follow up meeting  in San Antonio in Jan 2012 was characterised by continued argument about UML vs ADL. CIMI still doesn’t have consensus about the stuff already decided, let alone the hard stuff to come.

I’ve been an interested observer to CIMI from the beginning – it’s a great goal that we really need to see solved, the best group of people that we’ve got together on this subject, and there was real hope. Due to resource constraints, I’ve never been a formal member of the initiative, but I have attended the CIMI meetings and teleconferences whenever possible. But it’s never seemed to me that the participants are being realistic.

The core problem revolves around the problem of getting compromise. This was obviously going to be a problem here – many of the participants at CIMI have many millions invested in their systems, and I never could see how CIMI would avoid the outcome I described:

…build a complicated framework that allows both solutions to be described within the single paradigm, as if there isn’t actually contention that needs to be resolved, or that this will somehow resolve it. This is expensive – but not valuable; it’s just substituting real progress with the appearance thereof.

As you can see, CIMI is well on the way to building a complicated framework, and providing only the appearance of progress.

For me, this was underscored by the decision to choose ADL/AOM as the methodology, while deferring the choice of reference model. While I understood the political reality of this decision, choosing an existing methodology (ADL/AOM) but not the openEHR reference model committed CIMI to building at least a new tooling chain, a new community, and possibly a new reference model.

Each of these is spectacularly hard and expensive. At a minimum, using semi-volunteer labour of loving experts who are building their own empire, you have to estimate the cost of tooling at great than $2M (and doing it on a straight commercial basis, upwards of $6M). Reference models take years – as in, a decade – to build, and the blood, sweat and tears of many people. This also equates to millions of dollars one way or another. Building a community around a methodology and tool-chain are the same. So CIMI committed itself to these kind of expenditures of $$, energy and ego, but I can’t think that any of the participants really thought that CIMI can actually call on those resources before it produces anything of value.

As for UML, the plan called for “a set of UML stereotypes, XMI specifications and transformations” – this is the same error. The point of UML is that the average implementer knows how to make it work, and has tools that can leverage the models. Each stereotype you define erodes that advantage, and as soon you define a really important stereotype – and why bother if it’s not? – then off the shelf tools can no longer be used. As for developing XMI specifications… who’s going to support that? This is known as “snatching defeat from the jaws of victory”.

I can’t see that CIMI is on a path to producing anything, let alone a methodology that people will be happy to use, offered the choice.

So what should CIMI do? As I see it, there are two pragmatic choices. CIMI needs to pick one, or accept that it’s never going to reach consensus with the resources available:


That’s right. Just bite the bullet and pick the whole openEHR stack. They’ve got a reference model. They’ve got tooling – the archetype designer (open source!) and the CKM. They’ve got a community (using the CKM). They’ve got runs on the board with published models. It’s there, waiting to go.

I recognise that simply picking openEHR holus-bolus like this is extremely distasteful to many people.  OpenEHR is still missing a few things across the stack, and the reference model is too EHR-specific rather than being a general clinical model – and it seems unlikely that CIMI has the resources to change these things, so we’d just have to live with the way it as and work with them. And of course, there’s a series of personal and political factors.

This is the first choice: pick the least worst established clinical modelling paradigm.


The second option is to abandon any hope of a clinical-friendly modelling tool, and bite the bullet by adopting UML. This is the IT centric solution. But if you’re going to do this, do it is simply as possible. No fancy stereotypes. In fact, no enforcing of a reference model (it’s the reference model that complicates everything). Given the fragility of the UML tools (i.e. total lack of interoperability between tools), CIMI should ban anything other than classes, attributes, and associations. No stereotypes, no properties, no profiles. That’d mean a lot of missing functionality – but we’d just have to live that and work around it.

The real price of this isn’t that UML isn’t clinical-friendly, it’s the reference model. Given the cost of creating a reference model, and the fact the existing reference models aren’t created to be used in such a brutally simplistic way, this approach involves abandoning a serious reference model – and that’s exactly what some of the participants want, not understanding what it is that a reference model achieves.

Hybrid Models

From the beginning, CIMI has wanted to explore the auto-generation of multiple formats – CDA templates, v2, openEHR, java, xml, etc. Java, various forms of XML – that makes perfect sense. But the others? I spend quite a bit of time converting models and/or instances between the CDA, v2, and openEHR worlds, and they’re not just alternative syntaxes – they have completely different ways of understanding the world (or not, for v2). Real human input is required to effect these transforms. In the end, any auto-generation facility would become a transparent syntax conversion layer, and the CIMI models would have to contain the expression of the model in each of the target formalisms. It’s hard enough to model against one paradigm, let alone all 3 (or more). Whatever CIMI produced from this path would be a methodology that very few uber-experts could make use of. This isn’t an option.

Neither of the two choices are really palatable choices. But what other practical choices exist? CIMI is at a crossroads – it needs to pick something that will work.

p.s. Actually, several people have pointed out to me that FHIR might be a logical choice for CIMI – but FHIR’s got the slight problem that it doesn’t actually exist yet, so I’m not going push that forward for CIMI. Yet.


Technical Correction in Data Types R2 / ISO 21090

According to the data types, the PostalAddressUse Enumeration is based on the code system identified by the OID 2.16.840.1.113883.5.1012. But if you look up that OID in the OID registry (or the MIF, for insiders), you see that:

Retired as of the Novebmer 2008 Harmonization meeting, and replaced by 2.16.840.1.113883.5.1119 AddressUse.

This appears to be my error – I should’ve used the OID 2.16.840.1.113883.5.1119. We’ll have to see about issuing a technical correction.

Question: should you use Concept Ids or Description Ids in HL7 instances?


In my brief look at the standards, I can’t seem to find any requirement or convention in SNOMED or HL7 regarding the the use of Description IDs vs Concept IDs in messages or datasets.

Is there any preference or can Description IDs and Concept IDs be used interchangeably?


It is correct to use Concept Ids as the code in both HL7 v2 messages and CDA (and other v3 instances). Description Ids should not be used.


This is,unfortunately, not documented anywhere. It is implicit in the language used by the datatypes in both v2 and v3, though this is more obvious in v3,with the explicit focus on concepts. While its true that description ids also uniquely identify concepts, they additionally identify display strings which are duplicated by other fields in the appropriate types.

Authority is given, not taken

Real authority is not something that you can take, that you can purchase, that you can steal. It’s something that other people give you freely of their own accord. There’s no other way to get it. It’s important to distinguish power from authority – power is only ever taken, and never given. The two things are closely related – having authority in a sub-group (i.e. the armed forces, or the engineering department) can help you acquire power in a wider sphere. Authority is better than power, because having authority means that people want to do what you tell them.

In New Zealand, where I grew up, this notion is wonderfully captured in the word “mana”, borrowed and adapted from Maori:

“mana”, taken from the Maori, refers to a person or organization of people of great personal prestige and character. Sir Edmund Hillary, is considered to have great mana both because of his accomplishments and of how he gave his life to service. Perceived egotism can diminish mana…

In Australian culture, some of the few people who have attained “mana” in general society are Sirs Don Bradman, Fred Hollows, and Weary Dunlop. Politicians are generally not eligible.

Obviously there’s all sorts of applications of this concept in society, and in politics. For instance, governments that have power without authority will eventually fall, democracy or no (the longer it takes, the more people will die as it falls).

I’m interested here in this blog on how that affects standards. And what I’ve seen is that it doesn’t matter how much power is applied to get a standard to be adopted, if the standard doesn’t have any authority, it won’t make any difference. I’m not saying that power doesn’t make a difference – it does. But power is only useful to the degree that the standard itself has authority.

In practical terms, what this means is that if a government picks an unsuitable standard, it doesn’t matter how hard it pushes it, it’s not going to fly. This really frustrates people in government, used as they are to the exercise of power. But somehow – and I’m not actually exactly sure why – power evaporates and any victory is only apparent and fleeting.

Authority is given, not taken, and this is particularly true in the arena of standards.

p.s. when I re-read this, I guess some people various places are going to feel I am implying that the some particular choice of standard X by project/program/government Y for use Z is wrong. Some are, some aren’t. The really great thing is that we only really find out retrospectively 😉


Good Exchange Specifications are messy

It’s a paradox of software code: while we all strive to produce beautifully organized code, the hallmark of code that’s actually had a lot of exposure to the real world use is that it’s messy. people have had to fix specific things that are wrong, but they’ve tried, as much as possible, to leave all the things that are known to be fit for purpose in place.

While we all spend time pushing back against the messiness with refactorings and design patterns and complete redesigns, we can only push back against the messiness – it’s going to keep coming back at us.

Part of this is to understand that there’s different kinds of messiness

  1. Messiness that is created by handling special cases
  2. Messiness that is infrastructural – the whole organisation of everything is a mess
  3. Messiness that is decorational – the code itself is in a mess

It seems to me that there’s a logical conclusion to this, one which people are really reluctant to draw: the less special cases in the code, the less certain we can be about how useful it is on the grounds that it’s not been in regular use.

I think this applies to standards as well – the cleaner the standard, the less exposure to the real world it has.

So next time you hear people complaining about how HL7 v2 is all over the place, or that the Snomed heirarchies are a mess, stop and think: maybe, that’s because they’ve had a lot of actual use in the real world, rather than that they are poor quality standards that are not fit for use. There’s only one way to find out – try and actually use the relevant standard?

More follow up from the Senate Enquiry: Security

h/t to Bridget Kirkham from MSIA for pointing out an amazing submission to the Senate enquiry into the pcEHR from AusCERT:

The inclusion of personal identifying information (PII) in the form of PCEHR to be accessible from personal computers over the Internet which are easily compromised, is compounding a problem that has been progressively getting worse over several years and will expose more Australians to fraud and identity theft.

The AusCert argument is simple:

  • The PCEHR is claimed to be a secure system
  • But people will access it from their own computers
  • People’s own computers are hopelessly compromised
  • Criminals use compromised computers to gain access to Personal Identifying Information and access credentials
  • End-users can’t manage this
  • a breach of PCEHR confidentiality can’t be repaired, as a bank account breach can be

All of that is quite true. But what AusCERT don’t do is come to any conclusion. The logical conclusion is that we can’t give users access to their own records, though they don’t say that. They also raise some vague and ill-defined concern about prescriptions, without postulating any attack vector that makes sense…. I guess that free speech is, well, free.

AusCERT don’t differentiate between 3 different breaches that they raise as possibilities:

  • personal identifying information (PII)
  • access tokens
  • health records

With regard to personal identifying information, it’s not clear to me why access to an individual’s PII via the PCEHR is at all relevant, given that there’s so many other rich sources of PII scattered around their computer, and all over the cloud. I suppose that there’s an argument that an individual’s health record will contain more dependable PII than other sources, but I doubt that the % difference matters given the overall signal to noise ratio of that kind of data for the criminals. After reading what AusCERT wrote, I wondered whether they carry PII on their body – what’s to stop someone knocking them down and stealing it from them? How do they plan to prevent that?

As for access tokens, the only access token that a person may have for the PCEHR is the access token for the PCEHR itself, which is given to healthcare professionals to grant access to the record (if the patient wishes for this). I don’t see how stealing either this or a person’s health records makes sense in the general criminal sense. Some mafioso in Khazakhstan gets my latest shared health summary… what are they going to do with it? It’s only of interest when it’s a targeted personal attack. And if that attacker has gained access to an individual’s computer… how significant will the breach of the PCEHR itself be compared to everything else?

AusCERT do briefly mention the possibility that the person’s health data itself may be damaged, but in raising this prospect they miss the fact that the system is designed to prevent that because the patient might decide to try that on their own account.

Finally, AusCERT don’t say what to do. Is it their contention that access to the PCEHR should be confined to the new iPad (gratuitous mention), since they are the only secure computing platform available to the public? (I don’t count iPhone as a computing platform, and I doubt the iPad is actually secure or will stay that way, but it’s the nearest).

Secure computers aren’t so useful. Nor are secure lives. We’re just going to have to figure out how to mitigate the damages (starting with mitigating the risk of fraudulently impersonating a person with widely available data would be a good one).


Question: how to represent mobile phone numbers in HL7


Is there a standard place for mobile phone numbers in HL7 V2? For patients is it PID.13?

The answer depends on the version. For the versions in use in Australia at this time (v2.3.1 -> v2.5) there are two PID fields:

PID-13 Phone Number – Home This field contains the patient’s personal phone numbers
PID-14 Phone Number – Business This field contains the patient’s business telephone numbers

The problem is that the definition doesn’t help – some people carry two mobile phones, but most people just carry one and use it for both personal and business use. But for whatever reason they have a phone, when recording a mobile phone number, you don’t ask people why they have their mobile phone.

So you can put a mobile phone in either, and it will look like this:


where CP = cellular phone, or mobile.

Note that in v2.6/2.7, there are two changes:

  • PID fields 13 and 14 are deprecated, and a new field is introduced that uses codes in the 2nd component for purpose instead
  • The first component – the full telephone number – is deprecated in favour of a structured set of fields for country, area code, local number, and extension. Although it’s not clear what the area code is for a mobile in Australia, the real problem with this is that it pre-supposes enforcing higher control of clerical data entry (to get the fields right) in order to allow … computer dialing? I don’t think this is worth it.