#FHIR, RDF, and JSON-LD

FHIR doesn’t use JSON-LD. Some people are pretty critical of that:

It’s a pity hasn’t been made compatible. Enormous missed opportunity for interop & simplicity.

That was from David Metcalfe by Twitter. The outcome of our exchange after that was that David came down to Melbourne from Sydney to spend a few hours with me discussing FHIR, rdf, and json-ld (I was pretty amazed at that, thanks David).

So I’ve spent a few weeks investigating this, and the upshot is, I don’t think that FHIR should use json-ld.

Linked Data

It’s not that the FHIR team doesn’t believe in linked data – we do, passionately. From the beginning, we designed FHIR around the concept of linked data – the namespace we use is http://hl7.org/fhir and that resolves right to the spec. Wherever we can, we ensure that the names we use in that namespace are resolvable and meaningful on the hl7.org server (though I see that recent changes in the hosting arrangements have somehow broken some of these links). The FHIR spec, as a RESTful API, imposes a linked data framework on all implementations.

It’s just a framework though – using the framework to do fully linked data requires a set of additional behaviours that we don’t make implementers do. Not all FHIR implementers care about linked data – many don’t, and the more closely linked to institutional healthcare, the more important specific trading partner agreements become. One of the major attractions FHIR has in the healthcare space is that it can serve as a common format across the system, so supporting these kind of implementers is critical to the FHIR project. Hence, we like linked data, we encourage it’s use, but it’s not mandatory.

JSON-LD

This is where json-ld comes into the picture – the idea is that you mark up you json with a some lightweight links, which link the information in the json representation to it’s formal definitions so that the data and it’s context can be easily understood outside the specific trading partner agreements.

We like that idea. It’s a core notion for what we’re doing in FHIR, so it sounds like that’s how we should do things. Unfortunately, for a variety of reasons, it appears that it doesn’t make sense for us to use json-ld.

RDF

Many of the reasons that json-ld is not a good fit for FHIR arise because of RDF, which sits in the background of json-ld. From the JSON-LD spec:

JSON-LD is designed to be usable directly as JSON, with no knowledge of RDF. It is also designed to be usable as RDF, if desired, for use with other Linked Data technologies like SPARQL.

FHIR has never had an RDF representation, and it’s a common feature request. There’s a group of experts looking at RDF for FHIR (technically, the ITS WGM RDF project) and so we’ve finally got around to defining RDF for FHIR. Note that this page is editors draft for committee discussion – there’s some substantial open issues. We’re keen, though, for people to test this, particular the generated RDF definitions.

RDF for FHIR has 2 core parts:

  • An RDF based definition of the specification itself – the class definitions of the resources, the vocabulary definitions, and all the mappings and definitions associated with them
  • A method for representing instances of resources as RDF

Those two things are closely related – the instances are represented in terms of the class model defined in the base RDF, and the base RDF uses the instance representation in a variety of ways.

Working through the process of defining the RDF representation for FHIR has exposed a number of issues for an RDF representation of FHIR resources:

  • Dealing with missing data: a number of FHIR elements have a default value, or, instead, have an explicit meaning for a missing element (e.g. MedicationAdministration: if there is no “notGiven” flag, then the medication as given as stated). In the RDF world (well, the ontology world built on top of it) you can’t reason about missing data, since it’s missing. So an RDF representation for FHIR has to make the meaning explicit by requiring default values to be explicit, and providing positive assertions about some missing elements
  • Order does matter, and RDF doesn’t have a good solution for it. This is an open issue, but one that can’t be ducked
  • It’s much more efficient, in RDF, to change the way extensions are represented; in XML and JSON, being hierarchies (and, in XML, and ordered one), having a manifest where mandatory extension metadata (url, type) is represented is painful, and, for schema reasons, difficult. So this data is inlined into the extension representation. In RDF, however, being triple based with an inferred graph, it’s much more effective to separate these into a manifest
  • for a variety of operational reasons, ‘concepts’ – references to other resources or knowledge in ontologies such as LOINC or SNOMED CT – are done indirectly. For Coding, for instance, rather than simply having a URL that refers directly to the concept, we have system + code + version. If you want to reason about the concept that represents, it has to be mapped to the concept directly. That level of indirection exists for good operational reasons, and we couldn’t take it out. However the mapping process isn’t trivial

In the FHIR framework, RDF is another representation like XML and JSON. Client’s can ask servers to return resources or sets of resources using RDF instead of JSON or XML. Servers or clients that convert between XML/JSON and RDF will have to handle these issues – and the core reference implementations that many clients and servers choose to use will support RDF natively (at least, that’s what the respective RI maintainers intend to do).

Why not to use JSON-LD

So, back to json-ld. The fundamental notion of json-ld is that you can add context references to your json, and then the context points to a conversion template that defines how to convert the json to RDF.

From a FHIR viewpoint, then, either the definition of the conversion process is sophisticated enough to handle the kinds of issues discussed above, or you have to compromise either the JSON or the RDF or both.

And the JSON –> RDF conversion defined by the JSON-LD specification is pretty simple. In fact, we don’t even get to the issues discussed above before we run into a problem. The most basic problem has to do with names – JSON-LD assumes that everywhere a JSON property name is used, it has the same meaning. So, take this snippet of JSON:

{ 
  "person" : {
    "dob" : "1975-01-01",
    "name" : {
      "family" : "Smith",
      "given" : "Joe"
    }
  },
  "organization" : {
     "name" : "Acme"
  } 
}

Here, the json property ‘name’ is used in 1 or 2 different ways. It depends on what you mean by ‘meaning’. Both properties associate a human usable label to a concept, one that humans use in conversation to identify an entity, though it’s ambiguous. That’s the same meaning in both cases. However the semantic details of the label – meaning at a higher level – are quite different. Organizations don’t get given names, family names, don’t change their names when they get married or have a gender change. And humans don’t get merged into other humans, or have their names changed for marketing reasons (well, mostly ;-) ).

JSON-LD assumes that anywhere that a property ‘name’ appears, it has the same RDF definition. So that snippet above can’t be converted to json-ld by a simple addition of a json-ld @context. Instead, you would have to rename the name properties to ‘personName’ and ‘organizationName’ or similar. In FHIR, however, we’ve worked on the widely accepted practice that names are scoped by their type (that’s what types do). The specification defines around 2200 elements, with about 1500 names – so 700 of them or so use names that other elements also use. We’re not going to rename all these elements to pre-coordinate their type context into the property name. (Note that JSON-LD discussed supporting having names scoped by context – but this is an ‘outstanding’ request that seems unlikely to get adopted anytime soon).

Beyond that, the other issues are not addressed by json-ld, and unlikely to be soon. Here’s what JSON-LD says about ordered arrays:

Since graphs do not describe ordering for links between nodes, arrays in JSON-LD do not provide an ordering of the contained elements by default. This is exactly the opposite from regular JSON arrays, which are ordered by default

and

List of lists in the form of list objects are not allowed in this version of JSON-LD. This decision was made due to the extreme amount of added complexity when processing lists of lists.

But the importance of ordering objects doesn’t go away just because the RDF graph definitions and/or syntax makes it difficult. We can’t ignore it, and no one getting healthcare would be happy with the outcomes if we managed to get healthcare process to ignore it. The same applies to the issue with missing elements – there’s no facilty to insert default values in json-ld, let alone to do so conditionally.

So we could either

  • Complicate the json format greatly to make the json-ld RDF useful
  • Accept the simple RDF produced by json-ld and just say that all the reasoning you would want to do isn’t actually necessary
    • (or some combination of those two)
  • Or accept that there’s a transform between the regular forms of FHIR (JSON and XML which are very close) and the optimal RDF form, and concentrate on making implementations of that transform easy to use in practice

I think it’s inevitable that we’ll be going for the 3rd.

p.s. should json-ld address these issues? I think JSON-LD has to address the ‘names scoped by types’ issue, but for the rest, I don’t know. The missing element problem is ubiquitous across interfaces – elements with default values are omitted for efficiency everywhere – but there is a lot of complexity in these things. Perhaps there could be an @conversion which is a reference to a server that will convert the content to RDF instead of a @context. That’s not so nice from a client’s perspective, but it avoids specifying a huge amount of complexity in the conversion process.

p.p.s there’s further analysis about this on the FHIR wiki.

#FHIR DSTU ballots this year

Last week, the FHIR Management Group (FMG – the committee that has operational authority over the development of the FHIR standard) made a significant decision with regard to the future of the FHIR specification.

A little background, first. For about a year, we’ve been announcing our intent to publish an updated DSTU – DSTU 2 – for FHIR in the middle of this year. This new DSTU has many substantial improvements across the entire specification, both as a result of implementation experience from the first DSTU, and in response to market and community demand for additional new functionality. Preparing for this publication consists of a mix of activities – outreach and ongoing involvement in the communities and projects implementing FHIR, a set of standards development protocols to follow (internal HL7 processes), and ongoing consultation with an ever growing list of other standards development organizations. From a standards view point, the key steps are two-fold: a ‘Draft for comment’ ballot, and then a formal DSTU (Draft Standard for Trial Use).

  • Draft For comment: really, this is an opportunity to do formal review of the many issues that arose across the project, and a chance to focus on consistency across the specification (We held this step in Dec/Jan)
  • DSTU: This is the formal ballot – what emerges after comment reconciliation will be the final DSTU 2 posted mid-year

In our preparation for the DSTU ballot, which is due out in a couple of weeks time, what became clear is that some of the content was further along in maturity than other parts of it; Some have had extensive real world testing, and others haven’t – or worse, the real world testing that has occurred has demonstrated that our designs are inadequate.

So for some parts of the ballot, it would be better to hold of the ballot and spend more time getting them absolutely right. This was specially true since we planned to only publish a single DSTU, then wait for another 18months before starting the long haul towards a full normative standard. This meant that anything published in the DSTU would stand for at least 2 years, or if it missed out, it would be at least 2 years before making it into a stable version. For this content, there was a real reason to wait, to hold off publishing the standard.

On the other hand, most of the specification is solid and has been well tested – it’s much further along the maturity pathway. Further, there are a number of implementation communities impatient to see a new stable version around which they can focus their efforts, one that’s got the improvements from all the existing lessons learned, and further, one with a broader functionality to meet their use case. The two most prominent communities in this position are Argonaut and HSPC, both of which would be seriously impeded by a significant delay in publishing a new stable version – and neither of which use the portions of the specifications that are behind in maturity.

After discussion, what FMG decided to do is this:

  • Go ahead with the ballot as planned – this meets the interests of the community focused on PHR/Clinical record exchange
  • Hold a scope limited update to the DSTU (planned to be called 2.1) later this year for a those portions of the DSTU that are identified as being less mature

The scope limited update to the DSTU will not change the API, the infrastructure resources, or the core resources such as Patient, Observation etc. During the ballot reconciliation we’ll be honing the exact scope of the DSTU update project. Right now, these are the likely candidates:

  • the workflow/process framework (Order, OrderRequest, and the *Request/*Order resources)
  • The financial management resources

For these, we’ll do further analysis and consultation – both during the DSTU process and after it, and then we’ll we’ll hold a connectathon (probably October in Atlanta) in order to test this.

Canadian FHIR Connectathon

unnamedFHIR® North

Canada’s FHIR Connectathon


Event Details

Date: April 29th

Time: 9:00AM – 6:00PM

Location: Mohawk College, 135 Fennel Ave W, Hamilton, ON L9C 1E9

Room: Collaboratory (2nd Floor – Library)

Registration Cost: $45.00 (Entry, lunch, coffee break, pizza dinner)

Registration Site: http://www.mohawkcollegeenterprise.ca/en/event_list.aspx?groupId=3

Event Description

A FHIR connectathon is an opportunity for developers to come together to test their applications to determine if they can successfully interoperate using the HL7 FHIR specification. Participants will have a chance to meet and ask questions with some of the world’s leading FHIR experts, a chance to see whether FHIR really lives up to the hype and a chance to shape the specification.

Don’t miss out on this opportunity. Register now at http://www.mohawkcollegeenterprise.ca/en/event_list.aspx?groupId=3. For further details regarding this event, please review the pdf attached.

FHIR is attracting significant interest world-wide. Although the standard is still evolving, it’s being used in production in multiple countries. Numerous connectathons have been held in the U.S., Europe, South America and Australasia. It seems time to give Canadian developers a chance to take it out on the road.

FHIR North v1.0 (PDF)

Note: “HL7” and “FHIR” are registered trademarks of Health Level Seven International

Establishing Interoperability by Legislative Fiat

h/t to Roger Maduro for the notification about the Rep Burgess Bill:

The office of Rep. Michael C. Burgess, MD (R-Texas) released a draft of the interoperability bill that they have been working for the past several months on Friday. Rep. Burgess, one of the few physicians in Congress, has been working very hard with his staff to come up with legislation that can fix the current Health IT “lock-in” crisis.

Well, I’m not sure that it’s a crisis. Perhaps it’s one politically, but maybe legislation can help. With that in mind, the centerpiece of the legislation, as far as I can see, is these 3 clauses:

‘‘(a) INTEROPERABILITY.—In order for a qualified electronic health record to be considered interoperable, such record must satisfy the following criteria:

‘‘(1) OPEN ACCESS.—The record allows authorized users access to the entirety of a patient’s data from any and all qualified electronic health records without restriction.

‘‘(2) COMPLETE ACCESS TO HEALTH DATA.— The record allows authorized users access to the en- tirety of a patient’s data in one location, without the need for multiple interfaces (such as sign on systems).

‘‘(3) DOES NOT BLOCK ACCESS TO OTHER QUALIFIED ELECTRONIC HEALTH RECORDS.—The record does not prevent end users from interfacing with other qualified electronic health records.

Well, there’s some serious issues around wording here.

Firstly, with regard to #1:

  • What’s the scope of this? a natural reading of this is that “the record’ allows access to all patient data from any institution or anywhere else. I’m pretty sure that’s what not they mean to say, but what are they saying? What ‘any and all’?
  • Presumably they do want to allow the authorizing user – the patient – to be able restrict access to their record from other authorised users. But that’s not what it says
  • The proposed bill doesn’t clarify what’s the ‘patient record’ as opposed to the institution’s record about the patient. Perhaps other legislation qualifies that, but it’s a tricky issue. Where does, for instance, a hospital record a note that clinicians should be alert for parental abuse? In the child’s record where the parent sees it?
  • Further to this, just what are health records? e.g. Are the internal process records from a diagnostic lab part of ‘any and all qualified health records’? Just how far does this go?

With regard to #2:

  • What’s an ‘interface’? As a technologist, this has so many possible meanings… so many ways that this could be interpreted.
  • I think it’s probably not a very good idea for legislation to decide on system architecture choices. In particular, this sentence is not going to mesh well with OAuth based schemes for matching patient control to institutional liability, and that’s going to be a big problem.
  • I’m also not particularly clear what ‘one location’ means. Hopefully this would not be interpreted to mean that the various servers must be co-located, but if it doesn’t, what does it mean exactly?

With regard to #3:

  • I can’t imagine how one system could block access to other qualified health records. Except by some policy exclusivity, I suppose, but I don’t know what that would be. Probably, if this was written more clearly, I’d be in agreement. But I don’t really know what it’s saying

There’s some serious omissions from this as well:

  •  There’s nothing to say that the information must be understandable – a system could put up an end-point that returned an encrypted zip file of random assorted stuff and still meet the legislation
  • There’s no mention of standards or consistency at all
  • There’s no mention of any clinical criteria as goals or assessment criteria

The last is actually significant; one of the real obstacles to interoperability is the lack of agreement between clinicians (especially across disciplines) about clinical interoperability. There’s this belief that IT is some magic bullet that will create meaningful outcomes, but that won’t happen without clinical change.

As usual, legislation is a blunt instrument, and this bill as worded would do way more damage than benefit. So is there better wording? I don’t have any off the top of my head – anything we could try to say is embedded in today’s solutions, and would prevent tomorrow’s (better) solutions.

It would be good if the legislation at least mentioned standards, though. But we’re decades away from having agreed standards that cover even 10% of the scope of “any and all qualified electronic health records”

Guest Post: Breaking a lance for #FHIR

A guest post about FHIR adoption from Andreas Billig, Fraunhofer FOKUS. I asked Andreas to write about his experience using FHIR to implement a terminology service.

Although it seems not necessary to break a lance for FHIR, there might be some skeptics that need to be motivated to use FHIR for electronic health artifacts in future. The upcoming success of FHIR is directly related to past efforts for establishing a common modeling language and method for electronic health artifacts. One of the most prominent ambitions was HL7-v3 in the mid 2000s. So why trying to introduce a new language?

Past issues & consolidated goals

HL7-v3 was backboned with excellent concepts and techniques for developing models to ensure well-defined interpretation of the instances exchanged between various actors in electronic healthcare. It relies on model-driven approaches (MDA) well known in the software development community and on a sound architecture enabling re-use and refinement of model elements.

Starting from RIM (Reference Information Model) elements as a general vocabulary for the health care domain you can first define your DMIM (Domain Message Information Model) for specialized artifacts of a particular sub-domain. These models are not intended to be serializable and will be refined/constrained to so-called RMIMs (Refined Message Information Models). (Serializable) RMIMs have their direct equivalent in XML schemata which in turn are used as grammars for electronic artifacts exchanged between healthcare actors. A prominent instance of the RIM-DMIM-RMIM-XML chain is CDA (Clinical Document Architecture).

So far so good. But what were the issues? From our point of view there were at least five points where the development of FHIR went very good and learned from past issues:

  • dissemination by documentation
  • handling modeling complexity
  • extensibility
  • domain model diversity
  • integration with the method of profiling

The first point concerns the excellent documentation of the standard utilizing the hypertext paradigm. It can be seen as a domain-specific WIKI with concentrating on the essential elements with a consistent structure following the KISS-principle.

The Modeling complexity of FHIR is of low degree thanks to the profile- centric approach. Of course, HL7-v3 made very good efforts to handle complexity by introducing elements like class- and mood-code, shrinking the number of explicit classes (by the way, this was the reverse application of the principle of attribute discrimination). FHIR does not introduce them explicitly but due to the fact that most of the coding attributes are discriminating per se FHIR does not abandon this facility in general.

The extensibility approach of FHIR is at prominent place. Explicit language constructs to represent extensions of information items allows for managing and tracking extensions with direct machine support. It is expected that some extensions are gathered for reflow to standard resource profiles.

As said before, with HL7-v3 one can define arbitrary models to reach the goal of high domain model diversity by instantiating the RIM-DMIM-RMIM-XML chain as done in the CDA definition. From my point of view it was the HL7 intention that followers massively will define custom models for their projects. For reasons unknown to me (may be due to the understandable absence of modeling experts in projects) most of the users stuck with the CDA model to avoid the above chain and to profit from various resources developed around CDA. Unfortunately, this practice did not led to sound model diversity. Instead, many users tried to press everything in CDA – so the diversity was formulated within CDA and not sufficiently covered by tailored modeling language elements. FHIR minimizes this risks by its profile-oriented approach where strong model chaining cannot discourage users. Instead FHIR is initially equipped with a high diversity that can be refined or extended as needed for the project.

The present point is directly related to the last point. People and organizations, e.g. IHE, customized the chain end CDA/XML by defining various profiles – with the problems sketched above. FHIR is based consequently on profiling from the beginning to the end. This uniform way ensures low complexity and the absence of language breakings.

Ready for the web

Since many years easy-to-use web technologies are increasingly utilized. The most prominent examples are REST and JSON. I remember a telco two years ago where I asked “Will FHIR be the successor of V3?”. After a small pause the moderator analogously said “mmh, FHIR is merely intended for use on smartphones and tablets but we will see …”. Now we see that it makes no sense to artificially split the modeling techniques in dependence of distribution channel.

The lightweight profile-centric modeling approach of FHIR is directly targeted to main formats XML and JSON and therefore a perfect fit to the technological base of the web. Moreover, semantic web applications coming out of age and FHIR will easily be integrated by using RDF bindings. Of course we do not want to break up with MDA. Instead, many MDA results will flow in the FHIR development in future.

How we use fhir

Our institute have applied FHIR in several projects. First of all, the project DEMIS (German Electronic Notification System for Infection Protection) utilized FHIR resources to represent and exchange information of infectious diseases and pathogens, a project together with the Robert Koch Institute, the central federal institution responsible for disease control and prevention. Here we profited from the quick and easy definability of profiles for resources concerning infectious diseases. Moreover, the facility of handling FHIR profiles as first class citizens enabled us to generate input forms to be filled by healthcare actors.

Secondly, our terminology server CTS2-LE is able to import code systems and value sets represented by FHIR Value Set Resources. These artifacts are then mapped to the CTS2 standard. Here we benefit from their concise definition. As an side product, we were able to load all HL7-v3 code systems and the FHIR systems itself via this interface.

It has to be noted that of course we will continue with HL7-v3 as long as some market segments will base on this standard. HL7-v3 was a groundbreaking project that influences the FHIR development – or – one can say, made FHIR possible.

Thanks Grahame, for the opportunity to break a lance for FHIR in your blog

#FHIR Terminology Services Connectathon

This week I was in Washington DC for the inaugural FHIR terminology services connectathon. This was the first of its kind: a connectathon focused on the terminology services portion of the FHIR specification.

The following organizations were represented:

  • Apelon (on behalf of openHIE)
  • VSAC/NLM
  • IMO
  • Regenstrief
  • Lantana
  • Nictiz
  • CSIRO
  • Smart Platforms
  • HL7 (through vocabulary co-chairs from MDPartners and Hausam Consulting)

The focus of the connectathon was on the two simplest operations in the terminology services API:

  • Given a value set definition, generate an expansion that contains the actual codes in the value set
  • Test whether a value set contains a code/system pair

Value Set Operations

In order to use or implement the terminology services API, the first thing to do is to understand value set expansions. Logically, a value set has 3 aspects:

  • A set of metadata that identify the value set and describe its purpose and the rules for its use
  • A “content logical definition” – the rules about what codes from what code systems are included in the value set
  • The list of actual codes in the value set (the expansion)

The key to understanding this is that content logical definition can be both complicated and/or not fully specified. The most common example is that the logical definition doesn’t fix the version of the code system, e.g. “take all procedures from SNOMED CT” – but which version of SNOMED CT? That’s delegated to the run-time environment.

This means that the process of converting from the “content logical definition” to the expansion is complicated, and is basically a job for terminology expert systems.

The FHIR terminology service API includes a very simple operation to fetch the expansion for any given value set:

GET [base]/ValueSet/[id]/$expand

This says that for the server’s value set [id], return the list of codes in the value set. It’s up to the server how it decides to generate the expansion – maybe it generates it each time based on the definition, or maybe it caches it internally and simply sends it.

Value set validation is easier – the client simply gives the server a system and a code and asks whether it’s valid in a particular value set:

GET [base]/ValueSet/[id]/$validate?system=http://loinc.org&code=2344-2

The server returns a true/false – is the code valid, and in the specified valueset? – along with human readable description of any issues detected.

Note that for these operations, there’s a number of syntactical variations to reflect the various conditions under which the operation is executed.

A key part of the first day of the connectathon was a mini-tutorial that was a detailed review of the value set resource, the expansion operation, and the way that the entire terminology service space works in the FHIR eco-system.

Connectathon Test Script

This connectathon introduced a new feature that we intend to introduce in future connectathons: a test script (view the source). The test script contains:

  • An HTML summary of the tests that a server has to pass
  • A list of setup actions to take in order to create the conditions for the test script to execute
  • A series of test cases, each of which consists of:
    • A logical FHIR interaction to perform on the server
    • Rules about the HTTP response
    • Rules about the resource that is returned

Note: The test script is actually written as if it’s a FHIR resource, though we haven’t actually defined a resource for this yet.

For the terminology services connectathon, the test script

  • Defined a series of value sets that contained different kinds of content logical definition
  • Performed a series of different expansion and validation operations, checking that the server returned the right result

For this connectathon, the terminology script could be used in the several different ways – as simply a detailed technical specification of the kind of things that were being tested by the connectathon, or as an executional test sequence using the FHIR Sprinkler tool. For this connectathon, several parties used the script, but needed to vary its content to account for differences in server functionality, and also the inner details of the various SNOMED and other code systems in use.

The test script wasn’t the only way to test – a number of participants used either their browser or the POSTman browser plug-in to test the functionality of the various servers represented at the connectathon.

Outcomes

Since this was the first connectathon, we didn’t have any particular expectations in terms of formal testing outcomes. So it was a pleasant surprise that by the second day, a number of participants were using the test script to automate testing their server for technical conformance to the specification.

In particular, the VSAC prototype FHIR server was passing all 55 tests in the test script – congratulations! (This doesn’t mean that VSAC is going to put up a FHIR server to access all the value sets as of tomorrow, but clearly that’s the end game. Of course, it will take considerable time for them to get to that point and depends on approvals, budgets etc).

That’s our desired end-point, though – that all the primary sources of value sets will make them available through a standard API, one that’s used by all the operational terminology services as well.  Really, this connectathon was initiated by the openHIE community, who are examining the FHIR terminology services API to see whether it’s suitable for adoption as their base terminology API for exactly this use. For the FHIR project, it’s our pleasure to work with openHIE, because there’s strong alignment between our two communities.

This connectathon also produced the normal outcomes that we’ve come to expect:

  • Foster continued growth of a community with strong interactions
  • Significantly increase the knowledge of the FHIR API in that community
  • Generate a series of suggestions for improvement in the FHIR specification itself

I’m really pleased with these outcomes – I’ve come to believe that the terminology space will be a significant early win for the FHIR community. With that in mind, the attendees felt that for future connectathons, it would be good to make them part of the overall FHIR connectathons held the weekend before the HL7 working group meeting. This would have several advantages, the most important of which is that we can start testing out practical ways to further the integration of terminology services into routine use of FHIR for clinical purposes. OpenHIE may also consider holding their own connectathon at some time.

At the moment, the only server that was tested at the connectathon that is publicly available is mine. It’s at http://fhir-dev.healthintersections.com.au/open, and it passes all the tests in the test script. Hopefully we’ll be able to add a couple more servers that will be available in an ongoing fashion.

 

p.s. I’m intending to make a follow up post soon about how simple the terminology service is to use in practice, but we’re pretty busy preparing the upcoming FHIR DSTU ballot.

#FHIR for Laboratory Integration

As the FHIR project has progressed, many organizations are starting to face the difficult question: when should we think about using FHIR for our production interfaces?

Making this kind of evaluation depends on the technical merits of the various possible alternative standards, the existing ecosystem and how much is already invested in alternative approaches, and what kind of maturity rating is appropriate for the standard.

With regard to the last, see  Dixie Baker’s JAMIA paper, “Evaluating and classifying the readiness of technology specifications for national standardization”, but note that what kind of maturity is best for a project depends on that nature of the project and it’s participants. National regulations need something different than smaller projects with a shorter time line.

Here’s an example of a Russian assessment (by Nikolay from Health Samurai):

Saint-Petersburg government started project, whose goal is to create united integration bus for exchanging laboratory orders and results for state ambulatories, hospitals and laboratories.

…analysis…

Surely there are a lot of small technical questions, but we have to resume, that FHIR perfectly fits the needs & requirements of this project.

There’s no single answer as to whether FHIR is ready or not; while we’ve still got lots of the healthcare space to cover, some of what’s in FHIR is based on well understood concepts. If FHIR meets a project requirements, then there’s no reason not to use it, and take advantage of it’s advantages compared to other approaches.

I’m pleased that FHIR meets the needs to the St Petersburg  Project.

p.s. check out Nicolai’s Russian version of FHIR too.

Question about storing/processing Coded values in CDA document

Question
If a cda code is represented with both a code and a translation, which of the following should be imported as the stored code into CDR:
  1. a normalised version of the Translation element (using the clinical terminology service)
  2. the Code element exactly as it is in the cda document
The argument for option 1 is that since the Clinical Terminology Service is the authoritative source of code translations, we do not need to pay any attention to a ‘code’ field, even though the authoring system has performed the translation themselves (which may or may not be correct). The argument for option 2 is that Clinicians sign off on the code and translation fields provided in a document.  Ignoring the code field could potentially modify the intended meaning of the data being provided.
Answer
First, to clarify several assumptions:
“Clinicians sign off on the code and translation fields provided in a document”
Clinicians actually sign off on the narrative, not the data. Exactly how a code is represented in the data – translations or whatever – is not important like maintaining the narrative. Obviously there should be some relationship in the document, but exactly what that is is not obvious. And exactly what needs to be done with the document afterwards is even less obvious.
The second assumption concerns which is the ‘original’ code, and which is the ‘translation’. There’s actually a number of options:
  • The user picked code A, and some terminology server performed a translation to code B
    • The user picked code A, and some terminology server in a middle ware engine performed a translation to code B
  • The user picked code X, and some terminology server performed translations to both code A and B
  • The user was in some workflow, and this as manually associated with codes A and B in the source system configuration
  • The user used some words, and a language processor determined codes A and B
    • The user used some words, and two different language processors determined codes A and B

ok, the last is getting a bit fanciful – I doubt there’s one workable language processor out there, but there are definitely a bunch out there being evaluated. Anyway, the point is, the relationship between code A and code B isn’t automatically that one is translated from the other. The language in the data types specification is a little loose:

A CD represents any kind of concept usually by giving a code defined in a code system. A CD can contain the original text or phrase that served as the basis of the coding and one or more translations into different coding systems

It’s loose because it’s not exactly clear what the translations are of:

  • “a code in defined in a code system”
  • “the original text”
  • the concept

The correct answer is the last – each code, and the text, are all representations of the concept. So the different codes may capture different nuances, and it may not be possible to prove that the translation between the two codes is valid.

Finally, either code A or code B might be the root, and the other the translation. The specification says 2 different things about which is root: the original one (if you know which it is), or the one that meets the conformance rule (e.g. if the IG says you have to use SNOMED CT, then you put that in the root, and put the other in the translation, irrespective of the relationship between them).

Actually, which people do depends on what their trading partner does. One major system that runs several important CDRs ignores the translations altogether….

Turning to the actual question: what should a CDR do?

I think that depends on who’s going to be consuming / processing the data. If the CDR is an analysis end point – e.g. data comes in, and analysis reports come out, and also if the use cases are closed, then you could be safe to mine the CD looking for the code your terminology server understands, and just store that as a reference.

But if the use cases aren’t closed, so that it turns out that a particular analysis would be better performed against a different code system, then it would turn out that storing just the one understood reference would be rather costly. A great case is lab data that is coded by both LOINC and SNOMED CT – each of those serves different purposes.

This some applies if other systems are expected to access the data to do their own analysis – they’ll be hamstrung without the full source codes from the original document.

So unless your CDR is a closed and sealed box – and I don’t believe such a thing exists at design time – it’s really rather a good idea to store the Code element exactly as it is in the CDA document (and if it references that narrative text for the originalText, make sure you store that too)

 

 

A JSON representation for HL7 v2?

Several weeks ago, I was in Amsterdam for the Furore FHIR DevDays. While there, Nikolay from Health Samurai showed off a neat javascript based framework for sharing scripts that convert from HL7 v2 to FHIR.

Sharing these scripts, however, requires a standard JSON representation for HL7 v2 messages, and that turns out to have it’s challenges. Let’s start with what looks like a nice simple representation:

{
 "MSH" : ["|", null, "HC", "HC1456", "ATI", "ATI1001", "200209100000",
    null, null, [ "ACK", "ACK" ], "11037", "P", "2.4"],
 "MSA" : [ "AA", "345345" ]
}

This is a pretty natural way to represent a version 2 message in JSON, but it has a number of deficiencies. The first is that a message can contain more than one segment of the same type, and JSON property names must be unique (actually, JSON doesn’t explicitly say this, but Tim Bray’s clarification does). So the first thing we need to do is make the segments an array:

{
 "v2" : [
  [ "MSH", "|", null, "HC", "HC1456", "ATI", "ATI1001", "200209100000",
     null, null, [ "ACK", "ACK" ], "11037", "P", "2.4"],
  [ "MSA", "AA", "345345" ]
 ]
}

This format – where the segment code is item 0 in the array of values that represent the segment – has the useful property that field “1” in the HL7 definitions becomes item 1 in the array.

Btw, alert readers will note that the { “v2″: } part is pure syntax, and could potentially be dropped, but my experience is that many JSON parsers can only accept an object, not an array (arrays must be properties of objects), so we really should have an object wrapper. At the DevDays, we discussed pulling out some data from the MSH, and making it explicit:

{
 "event" : "ACK",
 "msg" : "ACK",
 "structure" : "ACK",
 "segments" : [
   ...
 ]
}

I’m not sure whether that’s justified or not. The information is in the MSH segments, so it’s straight duplication.

Problems

However this nice simple to grasp format turns out to be relatively unstable – the actual way that an item is represented depends on the values around it, and so scripts won’t be shareable across different implementations. As an example, take the representation of MSH-3, of type HD (ST from 2.1 to 2.2). In the example above, it’s represented as “HC” – just a simple string, to correspond to |HC|. If, however, the source message uses one of the other components from the HD data type, then it would change to a JSON representation of e.g. |^HC^L|, to, say:

 { "Universal ID" : "HC", "Universal ID Type" : "L" }

So the first problem is that whether or not subsequent components appear changes the representation of the first component. Note that this is an ambiguity built into v2 itself, and is handled in various different ways by the many existing HL7 v2 libraries. The second problem with this particular format is that the names given to the fields have varied across the different versions of HL7 v2, as they have never been regarded as signficant. Universal ID is known as “universal ID” from v2.3 to 2.4 – other fields have much more variation than that. So it’s better to avoid names altogether, especially since implementers regularly just use additional components that are not yet defined:

 { "2" : "HC", "3" : "L" }

but if all we’re going to do is have index values, then let’s just use an array:

 [ null, "HC", "L" ]

Though this does have the problem that component 2 is element 01 We could fix that with this representation:

 [ "HD", null, "HC", "L" ]

where the first item in the array has it’s type; this would be variable across versions, and could be omitted (e.g. replaced with null) – I’m not sure whether that’s a value addition or not. Below, I’m not going to add the type to offset the items in the array, but it’s still an option.

The general structure for a version 2 message (or batch) is:

  • A list of segments.
  • Each segment has a code, and a number of data elements
  • Each data element can occur more than once

then:

  • Each Data element has a type, which is either a simple text value, or one or more optional components
  • Each component has a type, which is either a simple text value, or one or more optional sub-components
  • Each subcomponent has a text value of some type

or:

  • Each Data element has one or more components
  • Each component has one or more subcomponents
  • Each subcomponent has a text value of some type

Aside: where’s the abstract message syntax? Well, we tried to introduce it into the wire format in v2.xml – this was problematic for several reasons (names vary, people don’t follow the structure, the structures are ambiguous in some commonly used versions, and most of all, injecting the names into the wire format was hard), and it didn’t actually give you much validation, which was the original intent, since people don’t always follow them. That’s why it’s called “abstract message syntax”. Here, we’re dealing with concrete message syntax.

The first is what the specification describes, but the wire format hides the difference between the various forms, and you can only tell them apart if you have access to the definitions. The problem is, often you don’t, since the formats are often extended informally or formally, and implementers make a mess of this across versions. And this practice is fostered by the way the HL7 committees change things. I’ve found, after much experimentation, that the best way to handle this is to hide the difference behind an API – then it doesn’t matter. But we don’t have an API to hide our JSON representation behind, and therefore we have to decide.

That gives us a poisoned chalice: we can decide for a more rigorous format that follows my second list. This makes for more complicated conversion scripts that get written against the wire format, and are much more re-usable, or we can decide for a less rigorous format that’s easier to work with, that follows the v2 definitions more naturally, but that is less robust and less re-useable.

Option #1: Rigor

In this option, there’s an array for every level, and following the second list:

  1. Array for segments
  2. Array for Data Elements
  3. Array for repeats
  4. Array for components
  5. Array for sub-components

And our example message looks like this:

{
 "v2" : [
  [ [[["MSH"]]], [[["|"]]], null, [[["HC"]]], [[["HC1456"]]], [[["ATI"]]], 
    [[["ATI1001"]]], [[["200209100000"]]], null, null, [[["ACK"], ["ACK"]]], 
    [[["11037"]]], [[["P"]]], [[["2.4"]]] ],
  [ [[["MSA"]]], [[["AA"]]], [[["345345"]]] ]
}

This doesn’t look nice, and writing accessors for data values means accessing at the sub-component level always, which would be a chore, but it would be very robust across implementations and versions. I’m not sure how to evaluate whether that’s worthwhile – mostly, but not always, it’s safe to ignore additional components that are added across versions, or in informal extensions.

Option 2: Simplicity

In this option, there’s a choice of string or array:

  1. Array for segments
  2. Array for Data Elements
  3. Array for repeats
  4. String or Array for components
  5. String or Array for sub-components

And our example message looks like this:

{
 "v2" : [
  [ "MSH", ["|"], null, ["HC"], ["HC1456"], ["ATI"], ["ATI1001"], ["200209100000"],
     null, null, [[ "ACK", "ACK" ]], ["11037"], ["P"], ["2.4"]],
  [ "MSA", ["AA"], ["345345"] ]
}

The annoying thing here is that we haven’t achieved the simplicity that we really wanted (what we had at the top) because of repeating fields. I can’t figure out a way to remove that layer without introducing an object (more complexity), or introducing ambiguity.

Summary

Which is better? That depends, and I don’t know how to choose. For the FHIR purpose, I think that the robust format is probably better, because it would allow for more sharing of conversion scripts. But for other users, the simpler format might be more appropriate.

p.s. Nikolay watched the discussion between James Agnew and myself on this with growing consternation, and decided to cater for multiple JSON formats. That’s probably a worse outcome, but I could understand his thinking.

 

 

Question: Using FHIR for systems integration

Question:

Is there information (i.e., FHIR standards, whitepapers etc.) discussing FHIR from a systems integration perspective? For example, have approaches been discussed on how to implement FHIR to consolidate and integrate information from multiple backend legacy (i.e., non-FHIR) systems then forward the information bundles as FHIR REST services? Have any middleware approaches (e.g, ESB, message buses, data tools) been discussed? The integration may also have “Governance” ramifications because the integration would want to prevent direct access to backend systems

Answer:

Well, this is certainly a core use case for FHIR, and we’ve had various participants to many connectathons trying these kind of scenarios out.

There’s some information published. In the specification itself, there’s Managing Resource Identity, and Push vs Pull. Here’s a list of blog links I could find:

There’s also some good information about this in the HL7 help desk (Under “FHIR Architecture”). Note that this material is HL7 members only