Monthly Archives: October 2011

CD Question: using codes in multiple related fields

This question comes from Singapore: how to you manage displayName, originalText, and most of all, translations, where you have multiple related and overlapping fields. This isn’t a simple question – just explaining the question is going to take quite a bit of content, let alone the answer (such that it is).

Note that this answer assumes ISO 21090 (Data types R2), though there’s some R1/R2 discussion at some points. Also, note that this is really a discussion about how CD works, not Snomed CT, though I don’t think you could quite have this discussion without Snomed CT.

Scenario

The scenario is about a problem list – a very common case around the world. Multiple different systems are exchanging their problem lists using this common model:

Problem
problem : CD
site : CD
status : CD

(obviously you need more fields than just those, but these are the bits I’m interested in).

All 3 fields are bound to SNOMED CT reference sets, and the reference sets are sufficiently large that their content is not of interest. The multiple sending systems use one of these three models

First System code : A Snomed-CT code that spans both site and status
Second System code : a local code that also spans statussite : a local code for site
Third System code : a local code for problemsite : a local code for sitestatus : a local code for status

Problem

In all the examples, we will use the example case “suspected lung cancer”. We will also assume a smart terminology server is available on the interface engine that is able to interconvert between snomed forms and between snomed and the local code systems.

Note that this is a common problem – several different fields where codes in one of the fields may cover more than one of them.

Here’s example UIs for the three systems:

(Courtesy of Linda Bird, thanks)

Question A: How is the problem correctly represented for each of the 3 systems?

For the first system:

 <problem code="162573006" codeSystem="[sct]">
  <displayname value="Suspected lung cancer"/>
  <originalText value=">Suspected lung cancer"/>
 </problem>
 <site nullFlavor=NA/>
 <status nullFlavor=NA/>

Whether this is correct depends on the rules around the 3 fields. NEHTA specifications usually say, in a case like this, that site and status need only have a value if they are not specified as part of the main code. Let’s assume, for now, that this is the case – it helps us for now by putting some problems off. But I do think, looking at the example above, the nullFlavor=NA isn’t quite the right nullFlavor – we kind of need NullFlavor.MadeIrrelevantByOtherValue (as opposed to, say, NullFlavor.NotPartOfSystemScope).

For the second system:

 <problem code="?cancer" codeSystem="[local1]">
   <displayName value="Query Cancer"/>
   <originalText>Investigation for lung cancer</originalText>
 </problem>
 <site code="lung" codeSystem="[local2]">
   <displayName value="Lungs"/>
   <originalText>Investigation for lung cancer</originalText>
 </site>
 <status nullFlavor=NA/>

It’s a bit hard to say what the originalText is – in this example, I’ve assumed that both codes were assigned from the same fragment of test (perhaps some assisted code entry).

For the third system:

 <problem code="cancer" codeSystem="[local3]">
   <displayName value="Cancer"/>
   <originalText>Cancer</originalText>
 </problem>
 <site code="lung" codeSystem="[local4]">
   <displayName value="Lungs"/>
   <originalText>Lungs</originalText>
 </site>
 <status code="suspected" codeSystem="[local5]">
   <displayName value="Suspected"/>
   <originalText>Suspected</originalText>
 </status>

In this case, the original texts are the same as the codes, because the user simply looked up the codes straight off the lists.

Question B: How do you add the Snomed codes?

The previous question was straight forward – just getting us going. In the scenario, we have a terminology server that is able to translate between various snomed codes and local codes. Of course, there’s no need for this for the first system.

For the second system:

 <problem code="162572001" codeSystem="[sct]">
   <displayName value="Suspected cancer"/>
   <originalText>Investigation for lung cancer</originalText>
   <translation code="?cancer" codeSystem="[local1]">
     <displayname>Query Cancer</displayName>
   </translation>
 </problem>
 <site code="39607008" codeSystem="[sct]">
   <displayName value="Lungs"/>
   <originalText>Investigation for lung cancer</originalText>
   <translation code="lung" codeSystem="[local2]">
     <displayname>Lung structure</displayName>
   </translation>
 </site>
 <status nullFlavor=NA/>

Note the order of the translations – in R2, the root code is the one that meets the conformance criteria, which is the snomed codes in this case. In R1, there was confusion about whether the root codes are those ones, or the original local codes since they are the source (R1 says both). If you really want to say which is the source, you can do this:

  <problem code="162572001" codeSystem="[sct]" codingRationale="R">
   <displayName value="Suspected cancer"/>
   <originalText>Investigation for lung cancer</originalText>
   <translation code="?cancer" codeSystem="[local1]" codingRationale="O">
     <displayname>Query Cancer</displayName>
   </translation>
 </problem>
 <site code="39607008" codeSystem="[sct]" codingRationale="R">
   <displayName value="Lungs"/>
   <originalText>Investigation for lung cancer</originalText>
   <translation code="lung" codeSystem="[local2]" codingRationale="O">
     <displayname>Lung structure</displayName>
   </translation>
 </site>
 <status nullFlavor=NA/>

For the third system:

 <problem code="363346000" codeSystem="[sct]">
   <displayName value="Cancer"/>
   <originalText>Malignant neoplastic disease</originalText>
   <translation code="cancer" codeSystem="[local3]">
      <displayname>Cancer</displayName>
   </translation>
 </problem>
 <site code="39607008" codeSystem="[sct]">
   <displayName value="Lungs"/>
   <originalText>Lung structure</originalText>
   <translation code="lung" codeSystem="[local4]">
      <displayname>Lungs</displayName>
   </translation>
 </site>
 <status code="415684004" codeSystem="[sct]">
   <displayName value="Suspected"/>
   <originalText>Suspected</originalText>
   <translation code="suspected" codeSystem="[local5]">
       <displayname>Suspected</displayName>
   </translation>
 </status>

Question C: How to you combine the codes?

In this case, the third system is sending to the first. Somewhere in the process (i.e. on some interface engine), the three fields are going to be combined. First of all, this could be done in the definitions:

Problem
problem : CD
site : CD {code | 363698007} (finding site)
status : CD {code | 408729009} (finding context)

Where this means that the site modifies code with the modifier value 363698007. Note that it doesn’t have to be done that way – some one could just code that relationship in the interface engine, but capturing it in the definitions offers much more leverage (aside: I don’t think this is possible in the RIM, but might the more implementation focused the static model becomes, the more likely it is to be useful).

Given those relationships, you can easily compose this snomed expression:

363346000:363698007=39607008,408729009=415684004

A really smart snomed system could determine that this is equivalent to 162573006 (Suspected lung cancer) and therefore just replace the expression with the single code.

Actually, of course, I lie. Snomed CT just never delivers on stuff like that. In order to make that equivalence, you’d have to determine all the defining properties of 162573006 (Suspected lung cancer), which includes subject relationship = “Subject of record”, and temporal context = “current or specified”. These you could provide through the definitions or the interface engine knowledge somehow. But Suspected Lung Cancer as an “associated finding” of “Malignant tumour of lung”. I think this is wrong – once you have a definite finding of a malignant tumour of the lung, I think you have a definite lung cancer, though I suppose someone’s going to point out some edge case where this isn’t true. But is it always true that you must have a known malignant tumour before you have a suspected cancer? I don’t think so. And the finding site is only connected to “Suspected Lung Cancer” through the associated finding. So I don’t think that a smart snomed system could make that equivalence – a person is going to have to do that. So much for snomed. Anyhow, let’s imagine someone made this rule, and we know that in this context,  162573006 = 363346000:363698007=39607008,408729009=415684004.

We can represent this as a code either way:

  <problem code="162573006" codeSystem="[sct]"/>
  <problem code="363346000:363698007=39607008,408729009=415684004" codeSystem="[sct]"/>

That’s the easy part. What should the originalText and displayName be for those two?

With regard to displayName, that’s easy for the pre-coordinated case – it’s the preferred name (Suspected lung cancer, as above). But what about for the expression? What’s the displayName for an expression? Snomed CT doesn’t define a displayName for an expression, though there’s various approaches around. I guess that means you could just make something up:

  "[Suspected] [Malignant neoplastic disease] in [Lung Structure]"

Where these are the preferred names for the three codes, and “in” is provided in code. But you’d have to know Snomed CT pretty well to do that – you’d have to be a world expert, in fact, to have the confidence to put that in production (it’s a good thing there’s so many world experts then). And if you don’t have the display name, how is it supposed to be clinically usable? (i.e. how the users of the first system supposed to understand an expression?)

So my first conclusion is that until IHTSDO publishes a consistent agreed method to produce display names for expressions, expressions will continue to be a research curio, not a production option.

The second problem is original text. The original text of the concept that is trying to be built is spread across three attributes:

problem: Cancer
site: Lung
status: Suspected

Given that, we can easily compose a meaningful original text:

 problem="Cancer",site="Lung",status="Suspected"

That’s very straight forward to understand for a human – with one caveat: the words for the parts (problem, etc) need to be the *UI* names that the human saw, not the interoperability names.

We’ve never clarified this anywhere with regard to CD.originalText. I’m going to propose that we document this approach in the next version of the data types (whatever “next version” means!).

There was some concern expressed to me that this original text above isn’t computer processible – that’s right, it’s not. Original text isn’t meant to be computer processible, it’s for a human.

This does give us two CDs now:

  <problem code="162573006" codeSystem="[sct]">
    <displayName value="Suspected lung cancer"/>
    <originalText value="problem=&quot;Cancer&quot;,site=&quot;Lung&quot;,status=&quot;Suspected&quot;"/>
  </problem>
  <problem code="363346000:363698007=39607008,408729009=415684004" codeSystem="[sct]">
    <displayName value="Suspected Malignant neoplastic disease in Lung Structure"/>
    <originalText value="problem=&quot;Cancer&quot;,site=&quot;Lung&quot;,status=&quot;Suspected&quot;"/>
  </problem>

that’s still not quite it though, because now we run into the translation issue. The original set was:

 <problem code="cancer" codeSystem="[local3]">
   <displayName value="Cancer"/>
   <originalText>Cancer</originalText>
 </problem>
 <site code="lung" codeSystem="[local4]">
   <displayName value="Lungs"/>
   <originalText>Lungs</originalText>
 </site>
 <status code="suspected" codeSystem="[local5]">
   <displayName value="Suspected"/>
   <originalText>Suspected</originalText>
 </status>

and now we have to get the final form in somehow. It simplifies our approach to say that the immediate snomed translations we added in question B are only transient, and we aren’t going to add them here. So let’s just stick the translation in the problem:

 <problem code="162573006" codeSystem="[sct]">
    <displayName value="Suspected lung cancer"/>
    <originalText value="problem=&quot;Cancer&quot;,site=&quot;Lung&quot;,status=&quot;Suspected&quot;"/>
    <translation code="cancer" codeSystem="[local3]">
      <displayName value="Cancer"/>
    </translation>
 </problem>
 <site code="lung" codeSystem="[local4]">
   <displayName value="Lungs"/>
   <originalText>Lungs</originalText>
 </site>
 <status code="suspected" codeSystem="[local5]">
   <displayName value="Suspected"/>
   <originalText>Suspected</originalText>
 </status>

That particular structure gives me all sorts of problems:

  • The original text of the problem is wrong – it’s “Cancer”, we we don’t have a place for the conflated original text of the problem.
  • the problem is in tension with the site and status – should they now have a nullFlavor=UNK, with the local code as a translation? But how does that make sense?
  • it gets worse – much worse – if the transients are added back in. (and since it would simplify things to take them out, all good terminologists will insist that they are in)
  • it’s pretty hard to think that the translation in problem is valid.

My conclusion: you can’t use CD translations to capture translations across multiple data types – you’ll have to create *structure* to enable that in the model that uses the data types and contains problem, site, and status (or whatever your cases are).

Note that if I was talking about R1 data types, the order of the translation etc would be reversed, but that wouldn’t really change the problem at all.

Question D: How do you split the code?

In this case, the first system is sending to the third. Somewhere in the process (i.e. on some interface engine), the problem field is going to be split up.

This process is the reverse of the problem discussed above. Theoretically, it could be done using the snomed definitions, but in practice, this would hardly ever work because of Snomed being Snomed.

Anyhow, we start with

 <problem code="162573006" codeSystem="[sct]">
  <displayname value="Suspected lung cancer"/>
  <originalText value=">Suspected lung cancer"/>
 </problem>
 <site nullFlavor=NA/>
 <status nullFlavor=NA/>

And now we have to split this up. Let’s assume that we have some magic terminology server that can do that, and tell us which snomed codes we are interested in – and then translate them to the local code systems used by the third system.

We could build this, perhaps:

 <problem code="162573006" codeSystem="[sct]">
  <displayname value="Suspected lung cancer"/>
  <originalText value=">Suspected lung cancer"/>
  <translation code="cancer" codeSystem="[local3]">
    <displayName value="Cancer"/>
  </translation>
 </problem>
 <site nullFlavor="NA">
   <originalText value="Suspected lung cancer"/>
   <translation code="lung" codeSystem="[local4]">
     <displayName value="Lungs"/>
   </translation>
 </site>
 <status  code="suspected" codeSystem="[local5]">
   <displayName value="Suspected"/>
   <originalText>Suspected lung cancer</originalText>
   <translation nullFlavor="NA"/>
 </status>

You’ll note that for site and status the translations are reversed. I don’t have a clue which is right there. Technically, in R2, the site example is correct. But it still gives me all sorts of other problems, and I come back to the same conclusion: you can’t use CD translations to capture translations across multiple data types – you’ll have to create *structure* to enable that in the model that uses the data types and contains problem, site, and status (or whatever your cases are).

p.s. a bonus. this is the R1 representation:

 <problem code="363346000" codeSystem="[sct]" displayName="Cancer">
  <qualifier>
    <code code="363698007"  displayName="finding site"/>
    <value code="39607008" displayName="Lung"/>
  </qualifier>
  <qualifier>
    <code code="408729009" displayName="finding status"/>
    <value code="415684004" displayName="suspected"/>
  </qualifier>
 </problem>

Concerning this, the R1 specification says: “Qualifiers constrain the meaning of the primary code, but cannot negate it or change it’s meaning to that of another value in the primary coding system”. I’ll leave argument to the comments about whether the finding status of “suspected” counts as negation or not (and also whether it matters that they actually do change it’s meaning to that of another value in the primary coding system).

 

Complexity of Standards

I’m in Singapore this week, speaking at the 2011 Healthcare IT standards Conference. It’s a real pleasure to be in Singapore meeting with many people I’ve corresponded with over the years, but never met, and also exploring such a great city. In addition, I’ve had many deep and interesting discussions around how to progress either Singapore’s Healthcare system needs, or international standards (or both). Today, I spoke on sharing the experience I’ve learned over many years in my many and varied roles in HL7 and other standards contexts.

Much of the content was things I have already covered in this blog, such as the 3 laws of interoperability, drive by interoperability, and requirements for interoperability, but several things were new, and I’m going to post them here.

I’ll start with this diagram that I showed. It’s a rough plot of the internal complexity of the standard (y, log) vs the complexity of content that the technique/standard describes.

Some notes to help explain this:

  • Text (bottom left) is my (0, 0) point
  • You can solve more complex problems with any of these techniques than their position on the X axis – but you have to invent more protocol on top of it. (That’s what’s XML is for!) So the position on the x axis is that innate complexity
  • The actual position of the particular items will be a comment magnet. But I think they’re generally correct in an order of magnitude kind of way
  • What the graph doesn’t show is all sorts of other quality measures, such as breadth, tooling, integrity, – there’s heaps of other criteria. This is just about complexity.

Complexity is an issue of growing importance – we know that we need to have it – the problems we are trying to solve in healthcare are complex, and we can’t make that go way. But this means that we can’t afford to choose approaches that are any more complex than they have to be – which most of the existing approaches are. I’m spending a lot of time thinking about the question of how to move to the lower right of this diagram – RFH is an answer to that, but is there more we can do?

Btw, where does Snomed CT appear on this diagram? Way off to the top and right… I can’t think of anything that would plot further up than Snomed.

 

Bugs in the CDA Instance Editor

The CDA Instance Editor is starting to get used for CDA validation here in Australia. As part of that use, several issues have cropped up in the CDA validator:

  • It doesn’t allow type substitution of ST for ED
  • It tries to valid xsi:type on properties of data type against the type of the data type, not the properties
  • It didn’t recognise that CD with a CWE binding can have an originalText without a code attribute

These were a bit of a surprise – I thought it was pretty well tested. I’ll be releasing an update soon.

 

v2 to CDA Mapping: Data Types

A couple of weeks ago, I held a v2 to CDA mapping course (See here). Overall, the course was a success – at least, that’s the feedback I had from the attendees, who left with a much deeper understanding of the problem space for mapping from HL7 v2 to CDA, and the ways to approach solutions. Several of the participants asked me if I was going to run the course again, since only some of the people in their institution could attend. Well, it depends on interest – if you’re interested, let me know.

Because there’s no “solution” to the problem – it depends so much on the context, there’s not a lot of good information in public about how to do such.  Keith Boone has a chapter (17) in his excellent book on CDA, but this is very general, and possibly a little US specific for an Australian context. I’m going to publish a few of my course notes here in the hope that it will be useful to other people working on v2 to CDA mappings.

Firstly, a general mapping guide from v2 data types to CDA datatypes:

Data Type CDA Data Type
OBX-5 types
ST – String ST (String)
TX -Text data ST (String)
FT -Formatted text ED (Encapsulated Data) Or Narrative
RP -reference pointer TEL (Telecommunications Reference)
NM -Numeric Usually PQ when combined with units field (i.e. OBX-6), or REAL
SN -structured numeric IVL, RTO, PQ, or CO depending on contents
NR -Numeric Range IVL<REAL> (interval of real)
DT -Date TS (Timestamp)
TS -time stamp TS (Timestamp)
TM -Time PQ (Physical Quantity, unit measure of time)
CQ -composite quantity with units PQ
DR -date/time range IVL<TS> (interval of timestamp)
ED -encapsulated data ED (encapsulated data)
EI -entity identifier II + more
MO – money MO (Money)
NA – numeric array LIST<REAL>
CD – channel definition Set of observations
TQ – timing quantity GTS + many more things
CE – coded element CD/CE (Concept)
CF -coded element with formatted values CD/CE (Concept)
CNE -coded with no exceptions CD/CE (Concept)
CWE -coded with exceptions CD/CE (Concept)
non-OBX-5 types
ID – Coded values for HL7 tables CS? CD? (Coded simple or Concept)
IS – Coded value for user-defined tables CS? CD? (Coded simple or Concept)
SI – Sequence ID INT (if mapped; unusual)
AD -address AD (Address)
CX -extended composite ID with check digit II + other things (see identification pattern)
CN – composite ID number and name II + PN + more
DLN -driver’s license number II + more
EI -entity identifier II? CD.codeSystem? More?
HD -hierarchic designator II.root? CD.codeSystem? More?
PL – person location Split across several locations
RFR -reference range IVL<PQ> + more
XAD -extended address AD (Address)
XCN – extended composite ID number and name for persons II + PN + more
XON -extended composite name and identification number for organizations ON (Organizational Name)
XPN – extended person name PN (Person Name)
XTN – extended telecommunication number TEL (Telecommunications Reference)

Notes:

  • mappings are only candidates – the correct mapping depends on context and scope.
  • Only useful data types included in this table.
  • mappings are imprecise – version 2 and CDA data type scopes can differ significantly

It’s important, when mapping from v2 data types to CDis level A, to not simply map at the data type level. This table provides a starting point, but you need to map each component every time, since the use of these is so variable, and there’s such internal mismatch between types (see here about II, for instance)