Category Archives: RDF

#FHIR Report from the Paris Working Meeting

I’m on the way home from HL7’s 2015 May Working Group Meeting. This meeting was held in Paris. Well, not quite Paris – at the Hyatt Regency at Charles De Gaulle Airport.

Memorium

A sad and quite unexpected event occurred at this meeting – Helmut Koenig passed away. Helmut Koenig was a friend who had attended HL7 and DICOM meetings for many years. Recently, he had contributed to the DICOM related resources, including ImagingStudy and ImagingObjectSelection resources.

Helmut actually passed away at the meeting itself, and we worked on resolving his ballot comments the next day. Links:

Ballot Summary

The FHIR community continues to grow in leaps and bounds. That was reflected in the FHIR ballot: we had strong participation and many detailed comments about the specification itself. Once all the ballot comments had been processed and duplicates removed, and line items traded amongst the various FHIR related specifications, the core specification had 1137 line items for committees to handle. You can see them for yourself on HL7’s gForge.

This is a huge task and will be the main focus of the FHIR community for the next couple of months as we grind towards publication of the second DSTU. At the meeting itself, we disposed of around 100 line items; I thought this was excellent work since we focused on the hardest and most controversial ones.

Connectathon

We had about 70 participants for the connectathon. Implementers focused on the main streams of the connectathon: basic Patient handling, HL7 v2 to FHIR conversion, Terminology Services, and claiming. For me, the key outcomes of the connectathon were:

  • We got further feedback about the quality of specification, with ideas for improvement
  • Many of the connectathon participants stayed on and contributed to ballot reconciliation through the week.

The connectathons are a key foundation of the FHIR Community – they keep us focused on making FHIR something that is practical and implementer focused.

We have many connectathons planned through the rest of this year (at least 6, and more are being considered). I’ll announce them here as the opportunity arises.

Collaborations

Another pillar of the FHIR Community is our collaborations with other health data exchange communities. In addition to our many existing collaborations, this meeting the FHIR core team met with Continua, the oneM2M alliance, and the IHE test tools team. (We already have a strong collaboration with IHE generally, so this is just an extension of this in a specific area of focus).

With IHE, we plan to have a ‘conformance test tools’ stream at the Atlanta connectathon, which will test the proposed (though not yet approved) TestScript resource, which is a joint development effort between Mitre, Aegis, and the core team. We expect that the collaboration with Continua will lead to a joint connectathon testing draft FHIR based Continua specifications later this year. Working with oneM2M will involve architectural and infrastructural development, and this will take longer to come to fruition.

FHIR Infrastructure

At this meeting, the HL7 internal processes approved the creation of a “FHIR Infrastructure” Work group. This work group will be responsible for the core FHIR infrastructure – base documentation, the API, the data types, and a number of the infrastructure resources. The FHIR infrastructure group has a long list of collaborations with other HL7 work groups such as Implementation Technology, Conformance and Implementation, Structured Documents, Modelling and Methodology, and many more. This just regularises the existing processes in HL7; it doesn’t signal anything new in terms of development of FHIR.

FHIR Maturity model

One of the very evident features of the FHIR specification as it stands today is that the content in it has a range of levels of readiness for implementation. Implementers often ask about this – how ready is the content for use?

We have a range – Patient, for instance, has been widely tested, including several production implementations. While the content might still change further in response to implementer experience, we know that what’s there is suitable for production implementation. On the other hand, other resources are relatively newly defined, and haven’t been tested at all. This will continue to be true, as we introduce new functionality into the specification; some – a gradually increasing amount – will be ready for production implementation, while new things will take a number of cycles to mature.

In response to this, we are going to introduce a FHIR Maturity model grading based on the well known CMM index. All resources and profiles that are published as part of the FHIR specification will have a FMM grade to help implementers understand where content is.

FHIR & Semantic Exchange

I still get comments from some parts of the HL7 community about FHIR and the fact that it is not properly based on real semantic exchange. I think this is largely a misunderstanding; it’s made for 2 main reasons:

  • The RIM mappings are largely in the background
  • We do not impose requirements to handle data properly

It’s true that we don’t force applications to handle data properly. I’d certainly like them to, but we can’t force them to, and one of the big lessons from V3 development was that we can’t, one way or another, achieve that. Implementers do generally want to improve their data handling, but they’re heavily constrained by real world constraints, including cost of development, legacy data, and that the paying users (often) don’t care.

And it’s true that the RIM mappings have proven largely of theoretical value; we’ve only had one ballot comment about RIM mappings, and very few people have contributed to them.

What we do instead, is insist that the infrastructure is computable; of all HL7 specifications, only FHIR consistently has all the value sets defined and published. Anyone who’s done CCDA implementation will know how significant this is.

Still, we have a long way to go yet. A key part of our work in this area is the development of RDF representations for FHIR resources, and the underlying definitions, including the reference models, and we’ll be putting a lot of work into binding to terminologies such as LOINC, SNOMED CT and others.

There’s some confusion about this: we’re not defining RDF representations of resources because we think this is relevant to typical operational exchange of healthcare data; XML and JSON cover this area perfectly well. Where RDF representations will be interface between operational healthcare data exchange and analysis and reasoning tools. Such tools will have applications in primary healthcare and secondary data usage.

#FHIR, RDF, and JSON-LD

FHIR doesn’t use JSON-LD. Some people are pretty critical of that:

It’s a pity hasn’t been made compatible. Enormous missed opportunity for interop & simplicity.

That was from David Metcalfe by Twitter. The outcome of our exchange after that was that David came down to Melbourne from Sydney to spend a few hours with me discussing FHIR, rdf, and json-ld (I was pretty amazed at that, thanks David).

So I’ve spent a few weeks investigating this, and the upshot is, I don’t think that FHIR should use json-ld.

Linked Data

It’s not that the FHIR team doesn’t believe in linked data – we do, passionately. From the beginning, we designed FHIR around the concept of linked data – the namespace we use is http://hl7.org/fhir and that resolves right to the spec. Wherever we can, we ensure that the names we use in that namespace are resolvable and meaningful on the hl7.org server (though I see that recent changes in the hosting arrangements have somehow broken some of these links). The FHIR spec, as a RESTful API, imposes a linked data framework on all implementations.

It’s just a framework though – using the framework to do fully linked data requires a set of additional behaviours that we don’t make implementers do. Not all FHIR implementers care about linked data – many don’t, and the more closely linked to institutional healthcare, the more important specific trading partner agreements become. One of the major attractions FHIR has in the healthcare space is that it can serve as a common format across the system, so supporting these kind of implementers is critical to the FHIR project. Hence, we like linked data, we encourage it’s use, but it’s not mandatory.

JSON-LD

This is where json-ld comes into the picture – the idea is that you mark up you json with a some lightweight links, which link the information in the json representation to it’s formal definitions so that the data and it’s context can be easily understood outside the specific trading partner agreements.

We like that idea. It’s a core notion for what we’re doing in FHIR, so it sounds like that’s how we should do things. Unfortunately, for a variety of reasons, it appears that it doesn’t make sense for us to use json-ld.

RDF

Many of the reasons that json-ld is not a good fit for FHIR arise because of RDF, which sits in the background of json-ld. From the JSON-LD spec:

JSON-LD is designed to be usable directly as JSON, with no knowledge of RDF. It is also designed to be usable as RDF, if desired, for use with other Linked Data technologies like SPARQL.

FHIR has never had an RDF representation, and it’s a common feature request. There’s a group of experts looking at RDF for FHIR (technically, the ITS WGM RDF project) and so we’ve finally got around to defining RDF for FHIR. Note that this page is editors draft for committee discussion – there’s some substantial open issues. We’re keen, though, for people to test this, particular the generated RDF definitions.

RDF for FHIR has 2 core parts:

  • An RDF based definition of the specification itself – the class definitions of the resources, the vocabulary definitions, and all the mappings and definitions associated with them
  • A method for representing instances of resources as RDF

Those two things are closely related – the instances are represented in terms of the class model defined in the base RDF, and the base RDF uses the instance representation in a variety of ways.

Working through the process of defining the RDF representation for FHIR has exposed a number of issues for an RDF representation of FHIR resources:

  • Dealing with missing data: a number of FHIR elements have a default value, or, instead, have an explicit meaning for a missing element (e.g. MedicationAdministration: if there is no “notGiven” flag, then the medication as given as stated). In the RDF world (well, the ontology world built on top of it) you can’t reason about missing data, since it’s missing. So an RDF representation for FHIR has to make the meaning explicit by requiring default values to be explicit, and providing positive assertions about some missing elements
  • Order does matter, and RDF doesn’t have a good solution for it. This is an open issue, but one that can’t be ducked
  • It’s much more efficient, in RDF, to change the way extensions are represented; in XML and JSON, being hierarchies (and, in XML, and ordered one), having a manifest where mandatory extension metadata (url, type) is represented is painful, and, for schema reasons, difficult. So this data is inlined into the extension representation. In RDF, however, being triple based with an inferred graph, it’s much more effective to separate these into a manifest
  • for a variety of operational reasons, ‘concepts’ – references to other resources or knowledge in ontologies such as LOINC or SNOMED CT – are done indirectly. For Coding, for instance, rather than simply having a URL that refers directly to the concept, we have system + code + version. If you want to reason about the concept that represents, it has to be mapped to the concept directly. That level of indirection exists for good operational reasons, and we couldn’t take it out. However the mapping process isn’t trivial

In the FHIR framework, RDF is another representation like XML and JSON. Client’s can ask servers to return resources or sets of resources using RDF instead of JSON or XML. Servers or clients that convert between XML/JSON and RDF will have to handle these issues – and the core reference implementations that many clients and servers choose to use will support RDF natively (at least, that’s what the respective RI maintainers intend to do).

Why not to use JSON-LD

So, back to json-ld. The fundamental notion of json-ld is that you can add context references to your json, and then the context points to a conversion template that defines how to convert the json to RDF.

From a FHIR viewpoint, then, either the definition of the conversion process is sophisticated enough to handle the kinds of issues discussed above, or you have to compromise either the JSON or the RDF or both.

And the JSON –> RDF conversion defined by the JSON-LD specification is pretty simple. In fact, we don’t even get to the issues discussed above before we run into a problem. The most basic problem has to do with names – JSON-LD assumes that everywhere a JSON property name is used, it has the same meaning. So, take this snippet of JSON:

{ 
  "person" : {
    "dob" : "1975-01-01",
    "name" : {
      "family" : "Smith",
      "given" : "Joe"
    }
  },
  "organization" : {
     "name" : "Acme"
  } 
}

Here, the json property ‘name’ is used in 1 or 2 different ways. It depends on what you mean by ‘meaning’. Both properties associate a human usable label to a concept, one that humans use in conversation to identify an entity, though it’s ambiguous. That’s the same meaning in both cases. However the semantic details of the label – meaning at a higher level – are quite different. Organizations don’t get given names, family names, don’t change their names when they get married or have a gender change. And humans don’t get merged into other humans, or have their names changed for marketing reasons (well, mostly 😉 ).

JSON-LD assumes that anywhere that a property ‘name’ appears, it has the same RDF definition. So that snippet above can’t be converted to json-ld by a simple addition of a json-ld @context. Instead, you would have to rename the name properties to ‘personName’ and ‘organizationName’ or similar. In FHIR, however, we’ve worked on the widely accepted practice that names are scoped by their type (that’s what types do). The specification defines around 2200 elements, with about 1500 names – so 700 of them or so use names that other elements also use. We’re not going to rename all these elements to pre-coordinate their type context into the property name. (Note that JSON-LD discussed supporting having names scoped by context – but this is an ‘outstanding’ request that seems unlikely to get adopted anytime soon).

Beyond that, the other issues are not addressed by json-ld, and unlikely to be soon. Here’s what JSON-LD says about ordered arrays:

Since graphs do not describe ordering for links between nodes, arrays in JSON-LD do not provide an ordering of the contained elements by default. This is exactly the opposite from regular JSON arrays, which are ordered by default

and

List of lists in the form of list objects are not allowed in this version of JSON-LD. This decision was made due to the extreme amount of added complexity when processing lists of lists.

But the importance of ordering objects doesn’t go away just because the RDF graph definitions and/or syntax makes it difficult. We can’t ignore it, and no one getting healthcare would be happy with the outcomes if we managed to get healthcare process to ignore it. The same applies to the issue with missing elements – there’s no facilty to insert default values in json-ld, let alone to do so conditionally.

So we could either

  • Complicate the json format greatly to make the json-ld RDF useful
  • Accept the simple RDF produced by json-ld and just say that all the reasoning you would want to do isn’t actually necessary
    • (or some combination of those two)
  • Or accept that there’s a transform between the regular forms of FHIR (JSON and XML which are very close) and the optimal RDF form, and concentrate on making implementations of that transform easy to use in practice

I think it’s inevitable that we’ll be going for the 3rd.

p.s. should json-ld address these issues? I think JSON-LD has to address the ‘names scoped by types’ issue, but for the rest, I don’t know. The missing element problem is ubiquitous across interfaces – elements with default values are omitted for efficiency everywhere – but there is a lot of complexity in these things. Perhaps there could be an @conversion which is a reference to a server that will convert the content to RDF instead of a @context. That’s not so nice from a client’s perspective, but it avoids specifying a huge amount of complexity in the conversion process.

p.p.s there’s further analysis about this on the FHIR wiki.