This guest post is written by Lloyd McKenzie. I’ve been meaning to get to it since the January WGM, but I’ve been wrapped up in other things (most recently HIMSS). However, I agree with what Lloyd says.
Question: I have a need to communicate a wide variety of language codes in HL7 v3 instances, but the ISO Data Type 21090 specification declares that ED.language (and thus ST.language) are constrained to IETF 3066. This is missing many of the languages found in ISO 639-3 – which I need. Also, IETF 3066 is deprecated. It’s been replaced twice. Can I just use ISO 639-3 instead?
The language in the 21090 specification was poorly chosen. It explicitly says “Valid codes are taken from the IETF RFC 3066”. What it should have said is “Valid codes are taken from IETF language tags, the most recent version of which at the time of this publication is IETF RFC 3066”. (Actually, by the time ISO 21090 actually got approved, the most recent RFC was 4646, but we’ll ignore that for now.) This should be handled as a technical correction, though that’s not terribly easy to do. However, implementers are certainly welcome to point to this blog as an authoritative source of guidance on ISO 21090 implementation and make use of any language codes supported in subsequent versions of the IETF Language Tags code system – including RFC 4646 and RFC 5646 as well as any subsequent version there-of.
The RFC 5646 version incorporates all of the languages found in ISO 639-3 and 639-5. However, be aware that while all languages are covered, there are constraints on the codes that can be used for a given language. Specifically, if a language is represented in ISO 639-1 (2-character codes), that form must be used. The 3-character variants found in ISO 639-2 cannot be used. For example, you must send “en” for English, not “eng”.
Question: But I want to send the 3-character codes. That’s what my system stores. Can’t I use ISO 639-2 directly?
No. In the ISO 21090 specification, the “language” property is defined as a CS. That means the data type is fixed to a single code system. The code system used is IETF Language Tags, which is consistent with what the w3c uses for language in all of their specifications and encompasses all languages in all of the ISO 639 specifications plus many others (for example, country-specific dialects as well as additional language tags maintained by IANA.)
Question: Well, ok, but what about in the RIM for LanguageCommunication.code. Can I send ISO 639-2 codes there?
Yes, though with a caveat. LanguageCommunication.code is defined as a CD, meaning you can send multiple codes – one primary code and as many translations as you see fit. You are free to send ISO 639-2 codes (the 3-character ones) or any other codes as a translation. However, LanguageCommunication.code has a vocabulary assertion of the HumanLanguage concept domain, which is universally bound to a value set defined as “all codes from the ietf3066 code system”. That means the primary code within the CD must be an IETF code. So that gives you two options:
- Fill the root code with the appropriate IETF code – copying the ISO code most of the time and translating the 3-character code to the correct 2-character code for those 84 language codes found in ISO 639-1; or
- Omit the root code property and set the null flavor to “UNC” (unencoded), essentially declaring that you haven’t bothered to try translating the code you captured into the required code sytem.
And before you mention it, yes, the reference to IETF 3066 is a problem. The actual code system name in the HL7 specification is “Tags for the Identification of Languages”, which is the correct name. However the short name assigned was “ietf3066” and the description in the OID registry refers explicitly to the 3066 version. This is an error, as IETF 3066 is a version of the IETF “Tags for the Identification of Language” code system and the OID is for the code system, not a particular version of it. (There have actually been 4 versions so far – 1766, 3066, 4646 and 5646.) We’ll try to get the short name and description corrected via the HL7 harmonization process
Question: But I don’t want to translate to 2-character codes and I don’t want to use a null flavor. Can’t we just relax the universal binding?
We can’t relax the binding because the HumanLanguage concept domain is shared by both the ED.language property in the abstract data types specification (which ISO 21090 is based on) and the LanguageCommunication.code attribute. The ED.language is a CS and so must remain universally bound.
In theory, we could split into two separate domains – one for data types and one for LanguageCommunication.code. The second one could have a looser binding. However, it’s hard to make a case for doing that. There are several issues:
First, having two different bindings for essentially the same sort of information is just going to cause grief for implementers. You could be faced with declaring what language the patient reads in one code system, but identifying the language of the documentation the patient’s supposed to read in a second code system.
Second, the IETF code system fully encompasses all languages covered by all the ISO 639-x code systems, plus thousands of others expressible using various sub-tags such as identifying country-specific dialects. In the unlikely situation that you need a language that can’t be expressed using any of those, there’s even a syntax for sending local codes (and a mechanism for registering supplemental codes with IANA if you want to be more official). So there should never be a situation where you can’t express your desired language using the IETF Language Tags code system.
Question: I don’t really care that I can express my languages in IETF. I’ve already pre-adopted using ISO 639-2 and -3 in my v3 implementation and I don’t want to change. Why are you putting constraints in place that prevent implementers from doing what they want to do?
Well, technically your implementation right now is non-conformant. And implementers always have the right to be non-conformant. HL7 doesn’t require anyone to follow any of its specifications. So long as your communication partners are willing to do what you want to do, anything goes by site-specific agreement.
However, the standards process is about giving up a degree of implementation flexibility in exchange for greater interoperability. By standardizing on a single set of codes for human language, we’re able to ensure interoperability across all systems. Natively, those systems may use other code systems, but for communication purposes, they translate to the common code system so everyone can safely exchange information.
If the premise for loosening a standard was “we won’t require any system to translate data from their native syntax”, there’d be no standards at all. Yes, translation and mapping requires extra effort (though a look-up table for 84 codes with direct 1..1 correspondence is pretty easy compared to a lot of the mapping effort needed in other areas.) But that’s the price of interoperability.