Category Archives: Interoperability

Interconversion between #FHIR and HL7 v2

Question

What advice do you give for introducing FHIR in new software, while continuing to maintain HL7v2 interoperability with client applications that do not speak FHIR?

For example, are FHIR resources the way to go as an internal representation of an application’s health care data?

If yes, is it practical to convert HL7 messages into FHIR resources (e.g. Patient, Practitioner, ProcedureRequest, ReferralRequest, Appointment…)? What open source software do you recommend for converting HL7 messages into FHIR resources (and vice-versa)?

Or is it better to use FHIR for external information exchange only (with outside FHIR clients)?

Answer

I’ve worked with several projects rebuilding their products around FHIR resources. Like all development methodologies, this has pros and  cons. What you get out of the box is a highly interoperable system, and you get a lot of stuff for free. But when your requirements go beyond what’s in the specification, it starts to get hard – FHIR is an interoperability standard, that focuses on the lowest common denominator: what everyone agrees with. Whether that’s a net benefit depends on how far beyond common agreement your going to go. (This, btw, is a demonstration of my 3rd law of interoperability).

It is practical to convert HL7 messages to FHIR resources and vice versa, yes. We’ve seen plenty of that going on. But there’s no canned solution, because to do the conversion, you have to do two things:

  • Figure out all the arcane business logic and information variants and code this into the conversion
  • Figure out how to integrate your conversion logic into your application framework

The upshot of this is that you have a programming problem, and most people solve this by taking a open source libraries for v2 and FHIR in the language of their choice (most languages have one of those) and writing the business logic and application integration in their development language of choice. Hence, there’s no particular open source library to do the job other than the parsers etc. There are some commercial middleware engines that include FHIR as one of the formats that can be supported.

In the FHIR spec, we’ve defined a mapping language that tries to abstract this – so you can separate the business logic from the application integration, and a platform independent business logic that has libraries for whatever platform. That’s an idea that is gradually starting to gather some interest, but is still a long way from maturity.

With regard to using FHIR for external exchange only… what I usually say about this that is that it makes sense to implement FHIR for new things first, and then to replace old things only when they become a problem. And most new stuff is on the periphery, where the architectural advantages to FHIR are really big. But internally, v2 will increasingly become a major service limitation in time, and will have to be replaced. The open question is how long that timeline is. We don’t know yet.

 

Lloyd McKenzie on Woody Beeler

Guest post: My close friend Lloyd wanted to share his thoughts on hearing the news about Woody.

My recollections of Woody are similar to Grahame’s.

I started my HL7 international journey in 2000.  In my case, it was in an attempt to understand how I could design my v2 profiles so they would be well aligned with v3.  I quickly learned the foolishness of that notion, but became enamored of the v3 effort.

HL7 was an extremely welcoming organization and Woody played a big part in that welcome.  I was a wet-behind the ears techy from Canada and he was an eminent physician, former organization chair and respected elder of the organization.  Yet he always treated me as an equal.  Over the years, we collaborated on tooling, data models, methodology, processes and the challenges of getting things done in an organization with many diverse viewpoints.  In addition to his sense of humour and willingness to get his hands dirty, I remember Woody for his passion.  He really cared about making interoperability work.  He was willing to listen to anyone who was trying to “solve the problem”, but he had little patience for those who he didn’t sense had similar motivations.

His openness to new ideas is perhaps best exemplified by his reaction to the advent of FHIR.  Woody was one of the founding fathers of v3 and certainly one of its most passionate advocates.  Over his time with HL7, he invested years of his life advocating, developing tools, providing support, educating, guiding the development of methodology and doing whatever else needed to be done.  Given his incredible investment in the v3 standard, it would not be surprising for him to be reluctant to embrace the new up-and-comer that was threatening to upset the applecart.  But he responded to the new development in typical Woody fashion.  He asked probing questions, he evaluated the intended outcomes and considered whether  the proposed path was a feasible and efficient way to satisfy those outcomes.  Once he had satisfied himself with the answers, he embraced the new platform.  Woody took an active role in forming the FHIR governance structures served as one of the first co-chairs of the FHIR govenance board.  To Woody, it was the outcome that mattered, not his ego.

Woody embraced life.  He loved traveling with his wife Selby (and his kids or grandkids when he could).  He loved new challenges.  He loved his work, but he wasn’t afraid to play either.  He was an active participant in after-hours WGM poker games.

It was with reluctance that Woody stepped back from his HL7 activities after his diagnosis with cancer, but as he expressed it at the time, he had discovered that he only had time for two of three important things – fighting his illness, spending time with his family and doing the work he loved with HL7.  He chose the right two priorities.

While version 3 might not have had the success we thought it would when we were developing it, the community that evolved under HL7 v3 and the knowledge we gleaned in that effort has formed the essential foundation and platform that enabled the building of FHIR.  I am grateful to have had Woody in my life – as a mentor, a co-worker and a friend.  I am grateful too for everything he helped build.  Woody’s priority was to focus on really making a difference.  In that he has set the bar very high for the rest of us.

Thank you for everything you’ve done Woody.  We miss you.

Woody Beeler has passed away

Today, Woody Beeler passed away after battling cancer for a few years. Woody was a friend, an inspiration, and my mentor in health care standards, and I’m going to miss him.

I first met Woody in 2001 at my first HL7 meeting. It was Woody who drew me into the HL7 community, and who educated me about the impact that standards could have. Many people at HL7 have told me the same thing – it was Woody that inspired them to become part of the community.

When I remember Woody, I think of his humour, his passion for developing the best standards possible, and his commitment to building the community out of which standards arise. And I remember the way Woody was prepared to roll up his sleeves and get his hands dirty to get the job done. To the point of learning significant new technical skills long after retirement age had come and gone. Truly, a Jedi master at healthcare standards.

For many years, Woody was the v3 project lead for HL7. Woody wasn’t blind to the issues with v3, but it was the best option available to the community at the time – so he gave everything he had to bring v3 to completion.  And it was Woody who mentored me through the early stages of establishing the FHIR community.

It’s my goal that in the FHIR project we’ll keep Woody’s commitment to community and healthcare outcomes – and doing whatever is needed – alive.

Vale Woody

(pic h/t Rene Spronk, who maintains a history of HL7 v3 – see http://ringholm.com/docs/04500_en_History_of_the_HL7_RIM.htm)

New #FHIR Vital Signs Profile

Over on the official FHR product blog, I just announced a new release. I wanted to expand on one of the features in the new version here

A new profile to describe vital signs (note: this is proposed as mandatory to enable better data sharing)

One of the emerging themes in many countries is sharing data with patients. And one of the broad set of things called ‘health data’ that is easiest to share is what is loosely called ‘vital signs’. It’s simple data, it’s easy to share with the patients, and we’re starting to see monitoring devices available in mainstream consumer technology. But it’s more than just patients that care – vital signs data is widely shared through the healthcare provision system, and there’s lots of interesting decision support and surveillance that can usefully be done with them.

But if every different part of the healthcare system, or different jurisdictions, represent basic vital signs differently, there’ll be no way to easily share decision support and survelliance systems, nor will patients be able to share their healthcare data with common data management tools – which are cross-jurisdictional (e.g. things like healthkit/carekit).  With this in mind, we’ve defined a vital signs profile in the new draft of FHIR, and said that all vital signs must conform to it. It doesn’t say much:

  • The common vital signs must be marked with a common generic LOINC code, in addition to whatever other codes you want to give them
  • There must be a value or a data absent reason
  • There must be a subject for the observations
  • Systolic/Diastolic must be represented using components
  • The units must be a particular UCUM unit

This is as minimal a floor as we can get: defining a common way to recognize a vital sign measurement, and a common unit for them. None of this restricts what else can be done, so this is really very minimal.

For FHIR, this is a very gentle step towards being proscriptive about how healthcare data is represented. But even this looks likely to generate fierce debate within the implementer community, some of whom don’t see the data sharing need as particularly important or near in the future. I’m writing this post to draw everyone’s attention to this, to ensure we get a good wide debate about this idea.

Note: it’s a proposal, in a candidate standard. It has to get through ballot before it’s actually mandatory.

 

Patient Matching on #FHIR Event at HIMSS

In a couple of weeks I’m off to HIMSS at Los Vegas. I’m certainly going to be busy while I’m there (If you’re hoping to talk to me, it would be best to email me to set up a time). Before HIMSS, there’s several satellite events:

  • Saturday: Health Informatics on FHIR: Opportunities in the New Age of Interoperability (IEEE)
  • Sunday: Patient Matching on FHIR event (HIMSS)
  • Monday: First joint meeting between HEART/UMA & FHIR/SMART teams – if you want to attend this meeting, let me know by email (there’s a couple of places still open)

About the Sunday meeting, quoting from the announcement:

Previous work included a Patient Testing Matching Event on this idea was developed at an event in Cleveland, OH on August 14th, 2015 at the HIMSS Innovation Center.  The event covered a tutorial on FHIR along with sessions on patient matching.  A key takeaway from the event was that the healthcare community can advance interoperability by working on a standard Application Programming Interface (API) for master patient index software, commonly used to facilitate patient data matching.

In fulfillment of this vision, we are hosting this second Patient Matching on FHIR Workshop in conjunction with the HIMSS 16 Annual Conference in Las Vegas.  We invite:

─         Algorithm vendors
─         EMR vendors,
─         Developers and standards experts
─         All interested parties

So, here’s passing on the invitation – see you there!

ps. I’ll pass on information about the IEEE event when I get a link.

Language Localization in #FHIR

Several people have asked about the degree of language localization in FHIR, and what language customizations are available.

Specification

The specification itself is published and balloted in English (US). All of our focus is on getting that one language correct.

There are a couple of projects to translate this to other languages: Russian  and  Japanese, but neither of these have gone very far, and to do it properly would be a massive task. We would consider tooling in the core build to make this easier (well, possible) but it’s not a focus for us.

One consequence of the way the specification works is that the element names (in either JSON or XML) are in english. We think that’s ok because they are only references into the specification; they’re never meant to be meaningful to any one other than a user of the specification itself.

Codes & Elements

What is interesting to us is publishing alternate language phrases to describe the codes defined in FHIR code systems, and the elements defined in FHIR resources. These are text phrases that do appear to the end-user, so it’s meaningful to provide these.

A Code System defined in a value set has a set of designations for each code, and these designations have a language on them:

code system

Not only have we defined this structure, we’ve managed to provide Dutch and German translations for the HL7 v2 tables. However, at present, these are all we have. We have the tooling to add more, it’s just a question of the HL7 Affiliates deciding to contribute the content.

For the elements (fields) defined as part of the resources, we’d happily add translations of these too, though there’s no native support for this in the StructureDefinition, so it would need an extension. However no one has contributed content like this yet.

It’s not necessary to do this as part of the specification (though doing it there provides the most visibility and distribution), though we haven’t defined any formal structure for a language pack. If there’s interest, this is something we could do in the future.

Useful Phrases

There’s also a source file that has a set of common phrases – particularly error phrases (which do get displayed to the user) – in multiple languages. This is available as part of the specification. Some of the public test servers use these messages when responding to requests.

Note that we’ve been asked to allow this to be authored through a web site that does translations – we’d be happy to do support this, if we found one that had an acceptable license, could be used for free, and had an API so we could download the latest translations as part of the build (I haven’t seen any service that has those features)

Multi-language support in resources

There’s no explicit support for multi-language representation in resources other than for value set. Our experience is that while mixed language content occurs occasionally, it’s usually done informally, in the narrative, and rarely formally tracked. So far, we’ve been happy to leave that as an extension, though there is ongoing discussion about that.

The one place that we know of language being commonly tracked as part of a resource is in document references, with a set of form letters (e.g. procedure preparation notes) in multiple languages. For this reason, the DocumentReference resource as a language for the document it refers to (content.language).

Question: #FHIR and patient generated data

Question:

With the increase in device usage and general consumer-centric health sites (e.g. myfitnesspal, Healthvault, Sharecare) coupled with the adoption of FHIR, it seems like it is becoming more and more common for a consumer to be able to provide the ability to share their data with a health system. The question I have lies in the intersection of self-reported data and the clinical record.

How are health systems and vendors handling the exchange (really, ingest) of self-reported data?

An easy example is something along the lines of I report my height as 5’10” and my weight as 175 in MyFitnessPal and I now want to share all my diet and bio  data with my provider.  What happens to the height and weight?  Does it get stored as a note?  As some other data point?  Obviously, with FHIR, the standard for transferring become easier, however, I’m curious what it looks like on the receiving end. A more complicated example might the usage of codifying an intake form.  How would i take a data value like “do you smoke” and incorporate that into the EHR?  Does it get stored in the actual clinical record or again, as a note?  If not in the clinical system, how do I report (a la MU) on this data point.

Answer

Well, FHIR enables this kind of exchange, but as you say, what’s actually happening in this regard? Well, as you say, it’s more a policy / procedure question, so I have no idea (though the draft MU Stage 3 rule would give credit for this as data from so-called “non-clinical” sources). But what I can do is ask the experts – leads on both the vendor and provider side. So that’s what I did, and here’s some of their comments.

From a major vendor integration lead:

For us, at least, the simplest answer is the always-satisfying “it’s complicated.”

Data generally falls into one of the following buckets:

  1. Data that requires no validation: data that is subjective. PHQ-2/9.
  2. Data that requires no validation (2): data that comes directly from devices/Healthkit/Google Fit.
  3. Data that requires minimal validation: data that is mostly subjective but that a clinician might want to validate that the patient understood the scope of the question – ADLs, pain score, family history, HPI, etc.
  4. Data that requires validation: typically, allergies/problems/meds/immunizations; that is, data that contributes to decision support and/or the physician-authored medical record.
  5. Data that is purely informational and that is not stored discretely.

Depending on what we are capturing, there are different confirmation paths.

Something like height and weight would likely file into (e). Height and weight are (a) already captured as a part of typical OP flow and (b) crucially important to patient safety (weight-based dosing), so it’s unlikely that a physician would favor a patient-reported height/weight over a clinic-recorded value.

That said, a patient with CHF who reports a weight gain > 2lb overnight will likely trigger an alert, and the trend remains important. But the patient-reported value is unlikely to replace a clinic-recorded value.

John Halamka contributed this BIDMC Patient Data Recommendation, along with a presentation explaining it. Here’s a very brief extract:

Purpose: To define and provide a process to incorporate Patient Generated Health Data into clinical practice

Clinicians may use PGHD to guide treatment decisions, similarly to how they would use data collected and recorded in traditional clinical settings. Judgment should be exercised when electronic data from consumer health technologies are discordant with other data

Thanks John – this is exactly the kind of information that is good to share widely.

Question: Solutions for synchronization between multiple HL7-repositories?

Question:

In the area of using HL7 for patient record storage, there are use cases to involve various sources of patient information who are involved in the care for one patient. For these people, we need to be able to offer a synchronization between multiple HL7-repositories. Are there any implementations of a synchronization engine between HL7 repositories?

Answer:

There is no single product that provides a solution like this. Typically, a working solution like this involves a great deal of custom business logic, and such solutions are usually solved using a mixture of interface engines, scripting, and bespoke code and services developed in some programming language of choice. See Why use an interface engine?

This is a common problem that has been solved more than once in a variety of ways with a myriad of products.

Here’s an overview of the challenge:

If by synchronization we mean just “replication” from A to B, then A needs to be able to send and B needs to receive messages or service calls. If by synchronization we mean two-way “symmetric” synchronization then you have to add logic to prevent “‘rattling” (where the same event gets triggered back and forth). An integration engine can provide the transformations between DB records and messages, but in general the concept codes and identifiers must still be reconciled between the systems.

For codes, an “interlingua” like SNOMED, LOINC, etc. is helpful if one or both of the systems uses local codes. The participants may implement translations (lookups) to map to the other participant or to the interlingua (it acts as the mediating correlator) The interface engine can call services, or perform the needed lookups. “Semantic” mapping incorporates extra logic for mapping concepts that are divided into their aspects (like LOINC, body system, substance, property, units, etc. Naturally if all participants actually support the interlingua natively the problem goes away. For identifiers, a correlating EMPI at each end can find-or-register patients based on matching rules. If a simplistic matching rule is sufficient and the receiving repository is just a database, then the integration engine alone could map the incoming demographic profile to a query against the patients table and look up the target patient – and add one if it’s new.

But if the target repository has numerous patients, with probabilistic matching rules (to maximize the rate of unattended matches, i.e. not bringing a human registrar into the loop to do merges), then the receiving system should implement a service of some kind (using HL7/OMG IXS standard, OMG PIDS (ref?), or FHIR), and the integration engine can translate the incoming demographic into a find-or-register call to that service. Such a project will of course require some analysis and configuration, but with most interface engines, there will be no need for conventional programming. Rather, you have (or make) trees that describe the message segments, tables, or service calls, and then you map (drag/drop) the corresponding elements from sources to targets.

An MDM or EMPI product worth its salt will implement a probabilistic matching engine and implement a web-callable interface (SOAP or REST) as described. If the participants are organizationally inside the same larger entity (a provider health system), then the larger organization may implement a mediating correlator just like the interlingua for terminology. The “correlating” EMPI assigns master identifiers in response to incoming feeds (carrying local ids) from source systems; Then that EMPI can service “get corresponding ids” requests to support the scenario you describe. An even tighter integration results if one or both participants actually uses that “master” id domain as its patient identifiers.

Here’s some example projects along these lines:

  • dbMotion created a solution that would allow a clinical workstation to access information about a common patient from multiple independent EMRs. It accomplished this by placing an adapter on top of EHR that exposed its data content in a common format (based upon the RIM) that their workstation application was able to query and merge the patient data from all the EMR into a single desktop view. The actual data in the source EHR were never modified in any way. This was implemented in Israel and then replicated in the US one RHIO at a time. (Note: dbMotion has since been acquired by Allscripts)
  • California State Immunization created a solution that facilitated synchronization of patient immunization history across the nine different immunization registries operating within the state. The solution was based upon a family of HL7 v2 messages that enabled each registry to request patient detail from another and use the query result to update its own record. This solution was eventually replaced by converting all the registries to a common technical platform and then creating a central instance of the system that served all of the regional registries in common (so synchronization was no longer an issue now that there was a single database of record, which is much simpler to maintain).
  • LA County IDR is an architecture put in place in Los Angles County to integrate data from the 19+ public health information system both as a means of creating a master database that could be used for synchronization and could be used as a single source to feed data analytics. The Integrated Data Repository was built using a design that was first envisioned as part of the CDC PHIN project. The IDR is a component of the CDC’s National Electronic Disease Surveillance System (NEDSS) implemented in at least 16 state health departments.

The following people helped with this answer: Dave Shaver, Abdul Malik Shakir, Jon Farmer

Profiles and Exceptions to the Rules

One of the key constructs in FHIR is a “profile”. A profile is a statement of how FHIR resources are used for a particular solution – or, how they should be used. The FHIR resources are a general purpose construct, and you can do kind of general purpose things with them, such as store the data in a PHR, and do generally useful display of a clinical record etc.

But if you’re going to do something more specific, then you need to be specific about the contents. Perhaps, for instance, you’re going to write a decision support module that takes in ongoing glucose and HBA1c measurements, and keeps the patient informed about how well they are controlling their diabetes. In order for a patient or an institution to use that decision support module well, the author of the module is going to have to be clear about what are acceptable input measurements – and it’s very likely, unfortunately, that the answer is ‘not all of them’. Conversely, if the clinical record system is going to allow it’s users to hook up decision support modules like this, it’s going to have to be clear about what kind of glucose measurements it might feed to the decision support system.

If both the decision support system and the clinical records system produce profiles, a system administrator might even able to get an automated comparison to see whether they’re compatible. At least, that’s where we’d like to end up.

For now, however, let’s just consider the rules themselves. A clinical record system might find itself in this situation:

  • We can provide a stream of glucose measurements to the decision support system
  • They’ll come from several sources – labs, point of care testing devices, inpatient monitoring systems, and wearables
  • There’s usually one or more intermediary systems between the actual glucose measurement, and the clinical record system (diagnostic systems, bedside care systems, home health systems – this is a rapidly changing space)
  • Each measurement will have one of a few LOINC codes (say, 39480-9: Glucose [Moles/volume] in Venous blood, 41652-9: Glucose [Mass/volume] in Venous blood,
    14743-9: Glucose [Moles/volume] in Capillary blood by Glucometer)
  • the units of measure will be mg/dL or mmol/L
  • there’ll be a numerical value, perhaps with a greater than or less than comparator (e.g. >45mmol/L)

So you can prepare a FHIR profile that says this one way or another. And then a decision support engine can have a feel for what kind of data it might get, and make sure it can handle it all appropriately.

So that’s all fine. But…

Eventually, the integration engineers that actually bring the data into the system discover – by looking at rejected messages (usually) – 1 in a million inbound glucose measurements from the lab contain a text message instead of a numerical value. The message might be “Glucose value to high to determine”. Now what? From a clinical safety perspective, it’s almost certain that the integration engineers won’t replace “too high to determine’ with a “>N” where N is some arbitrarily chosen number – there’s no number they can choose that isn’t wrong. And they won’t be able to get the source system to change their interface either – that would have other knock-on effects for other customers / partners of the source system. Nor can they drop the data from the clinical record – it’s the actual test result. So they’ll find a way to inject that value into the system.

Btw- aside – some of the things that go in this string value could go in Observation.dataAbsentReason, but they’re not coded, and it’s not possible to confidently decide which are missing reasons, and which are ‘text values’. So dataAbsentReason isn’t a solution to this case, though it’s always relevant.

Now the system contains data that doesn’t conform to the profile it claimed to use. What should happen?

  1. The system hides the data and doesn’t let the decision support system see it
  2. The system changes it’s profile to say that it might also send text instead of a number
  3. The system exposes the non-conformant data to the decision support system, but flags that it’s not valid according to it’s own declarations

Neither of these is palatable. I assume that #1 isn’t possible, at least, not as a blanket policy. There’s going to be some clinical safety reason why the value has to be passed on, just the same as the integration engineers passed it on in the first place, so that there’re not liable.

Option #2 is a good system/programmer choice – just tell me what you’re going to do, and don’t beat around the bush. And the system can do this – it can revise the statement ‘there’ll be a numerical value’ to something like ‘there’ll be a numerical value, or some text’. At least this is clear.

Only it creates a problem – now, the consumer of the data knows that they might get a number, or a string. But why might the get a string? what does it mean? Someone does know, somewhere, that the string option is used 1 in a million times, but there’s no way (currently, at least) to say this in the profile – it just says what’s possible, not what’s good, or ideal, or common. If you start considering the impact of data quality on every element – which you’re going to have to do – then you’re going to end up with a profile that’s technically correct but quite non-comunicative about what the data might be, nor one that provides any guidance as to what it should be, so that implementers know what they should do. (and observationally, if you say that it can be a string, then, hey, that’s what the integration engineers will do to, because it’s quicker….)

That’s what leads to the question about option #3: maybe the best thing to do is to leave the profile saying what’s ideal, what’s intended, and let systems flag non-conforming resources with a tag, or wrong elements with an extension? Then the consumer of the information can always check, and ignore it if they want to.

That is, if they know about the flag, and remember. Which means we’d need to define it globally, and the standard itself would have to tell people to check for data that isn’t consistent with it’s claims… and then we’d have to add overrides to say that some rules actually mean what they say, as opposed to not actually meaning that…. it all sounds really messy to me.

Perhaps, the right way to handle this is to have ideal and actual profiles? That would mean an extension to the Conformance resource so you could specify both – but already the interplay between system and use case profiles is not well understood.

I think this area needs further research.

p.s. There’s more than some passing similarity between this case and the game of ‘hot potato‘ I used to play as a kid: ‘who’s going to do have to do something about this bad data’.

#FHIR, RDF, and JSON-LD

FHIR doesn’t use JSON-LD. Some people are pretty critical of that:

It’s a pity hasn’t been made compatible. Enormous missed opportunity for interop & simplicity.

That was from David Metcalfe by Twitter. The outcome of our exchange after that was that David came down to Melbourne from Sydney to spend a few hours with me discussing FHIR, rdf, and json-ld (I was pretty amazed at that, thanks David).

So I’ve spent a few weeks investigating this, and the upshot is, I don’t think that FHIR should use json-ld.

Linked Data

It’s not that the FHIR team doesn’t believe in linked data – we do, passionately. From the beginning, we designed FHIR around the concept of linked data – the namespace we use is http://hl7.org/fhir and that resolves right to the spec. Wherever we can, we ensure that the names we use in that namespace are resolvable and meaningful on the hl7.org server (though I see that recent changes in the hosting arrangements have somehow broken some of these links). The FHIR spec, as a RESTful API, imposes a linked data framework on all implementations.

It’s just a framework though – using the framework to do fully linked data requires a set of additional behaviours that we don’t make implementers do. Not all FHIR implementers care about linked data – many don’t, and the more closely linked to institutional healthcare, the more important specific trading partner agreements become. One of the major attractions FHIR has in the healthcare space is that it can serve as a common format across the system, so supporting these kind of implementers is critical to the FHIR project. Hence, we like linked data, we encourage it’s use, but it’s not mandatory.

JSON-LD

This is where json-ld comes into the picture – the idea is that you mark up you json with a some lightweight links, which link the information in the json representation to it’s formal definitions so that the data and it’s context can be easily understood outside the specific trading partner agreements.

We like that idea. It’s a core notion for what we’re doing in FHIR, so it sounds like that’s how we should do things. Unfortunately, for a variety of reasons, it appears that it doesn’t make sense for us to use json-ld.

RDF

Many of the reasons that json-ld is not a good fit for FHIR arise because of RDF, which sits in the background of json-ld. From the JSON-LD spec:

JSON-LD is designed to be usable directly as JSON, with no knowledge of RDF. It is also designed to be usable as RDF, if desired, for use with other Linked Data technologies like SPARQL.

FHIR has never had an RDF representation, and it’s a common feature request. There’s a group of experts looking at RDF for FHIR (technically, the ITS WGM RDF project) and so we’ve finally got around to defining RDF for FHIR. Note that this page is editors draft for committee discussion – there’s some substantial open issues. We’re keen, though, for people to test this, particular the generated RDF definitions.

RDF for FHIR has 2 core parts:

  • An RDF based definition of the specification itself – the class definitions of the resources, the vocabulary definitions, and all the mappings and definitions associated with them
  • A method for representing instances of resources as RDF

Those two things are closely related – the instances are represented in terms of the class model defined in the base RDF, and the base RDF uses the instance representation in a variety of ways.

Working through the process of defining the RDF representation for FHIR has exposed a number of issues for an RDF representation of FHIR resources:

  • Dealing with missing data: a number of FHIR elements have a default value, or, instead, have an explicit meaning for a missing element (e.g. MedicationAdministration: if there is no “notGiven” flag, then the medication as given as stated). In the RDF world (well, the ontology world built on top of it) you can’t reason about missing data, since it’s missing. So an RDF representation for FHIR has to make the meaning explicit by requiring default values to be explicit, and providing positive assertions about some missing elements
  • Order does matter, and RDF doesn’t have a good solution for it. This is an open issue, but one that can’t be ducked
  • It’s much more efficient, in RDF, to change the way extensions are represented; in XML and JSON, being hierarchies (and, in XML, and ordered one), having a manifest where mandatory extension metadata (url, type) is represented is painful, and, for schema reasons, difficult. So this data is inlined into the extension representation. In RDF, however, being triple based with an inferred graph, it’s much more effective to separate these into a manifest
  • for a variety of operational reasons, ‘concepts’ – references to other resources or knowledge in ontologies such as LOINC or SNOMED CT – are done indirectly. For Coding, for instance, rather than simply having a URL that refers directly to the concept, we have system + code + version. If you want to reason about the concept that represents, it has to be mapped to the concept directly. That level of indirection exists for good operational reasons, and we couldn’t take it out. However the mapping process isn’t trivial

In the FHIR framework, RDF is another representation like XML and JSON. Client’s can ask servers to return resources or sets of resources using RDF instead of JSON or XML. Servers or clients that convert between XML/JSON and RDF will have to handle these issues – and the core reference implementations that many clients and servers choose to use will support RDF natively (at least, that’s what the respective RI maintainers intend to do).

Why not to use JSON-LD

So, back to json-ld. The fundamental notion of json-ld is that you can add context references to your json, and then the context points to a conversion template that defines how to convert the json to RDF.

From a FHIR viewpoint, then, either the definition of the conversion process is sophisticated enough to handle the kinds of issues discussed above, or you have to compromise either the JSON or the RDF or both.

And the JSON –> RDF conversion defined by the JSON-LD specification is pretty simple. In fact, we don’t even get to the issues discussed above before we run into a problem. The most basic problem has to do with names – JSON-LD assumes that everywhere a JSON property name is used, it has the same meaning. So, take this snippet of JSON:

{ 
  "person" : {
    "dob" : "1975-01-01",
    "name" : {
      "family" : "Smith",
      "given" : "Joe"
    }
  },
  "organization" : {
     "name" : "Acme"
  } 
}

Here, the json property ‘name’ is used in 1 or 2 different ways. It depends on what you mean by ‘meaning’. Both properties associate a human usable label to a concept, one that humans use in conversation to identify an entity, though it’s ambiguous. That’s the same meaning in both cases. However the semantic details of the label – meaning at a higher level – are quite different. Organizations don’t get given names, family names, don’t change their names when they get married or have a gender change. And humans don’t get merged into other humans, or have their names changed for marketing reasons (well, mostly 😉 ).

JSON-LD assumes that anywhere that a property ‘name’ appears, it has the same RDF definition. So that snippet above can’t be converted to json-ld by a simple addition of a json-ld @context. Instead, you would have to rename the name properties to ‘personName’ and ‘organizationName’ or similar. In FHIR, however, we’ve worked on the widely accepted practice that names are scoped by their type (that’s what types do). The specification defines around 2200 elements, with about 1500 names – so 700 of them or so use names that other elements also use. We’re not going to rename all these elements to pre-coordinate their type context into the property name. (Note that JSON-LD discussed supporting having names scoped by context – but this is an ‘outstanding’ request that seems unlikely to get adopted anytime soon).

Beyond that, the other issues are not addressed by json-ld, and unlikely to be soon. Here’s what JSON-LD says about ordered arrays:

Since graphs do not describe ordering for links between nodes, arrays in JSON-LD do not provide an ordering of the contained elements by default. This is exactly the opposite from regular JSON arrays, which are ordered by default

and

List of lists in the form of list objects are not allowed in this version of JSON-LD. This decision was made due to the extreme amount of added complexity when processing lists of lists.

But the importance of ordering objects doesn’t go away just because the RDF graph definitions and/or syntax makes it difficult. We can’t ignore it, and no one getting healthcare would be happy with the outcomes if we managed to get healthcare process to ignore it. The same applies to the issue with missing elements – there’s no facilty to insert default values in json-ld, let alone to do so conditionally.

So we could either

  • Complicate the json format greatly to make the json-ld RDF useful
  • Accept the simple RDF produced by json-ld and just say that all the reasoning you would want to do isn’t actually necessary
    • (or some combination of those two)
  • Or accept that there’s a transform between the regular forms of FHIR (JSON and XML which are very close) and the optimal RDF form, and concentrate on making implementations of that transform easy to use in practice

I think it’s inevitable that we’ll be going for the 3rd.

p.s. should json-ld address these issues? I think JSON-LD has to address the ‘names scoped by types’ issue, but for the rest, I don’t know. The missing element problem is ubiquitous across interfaces – elements with default values are omitted for efficiency everywhere – but there is a lot of complexity in these things. Perhaps there could be an @conversion which is a reference to a server that will convert the content to RDF instead of a @context. That’s not so nice from a client’s perspective, but it avoids specifying a huge amount of complexity in the conversion process.

p.p.s there’s further analysis about this on the FHIR wiki.