Monthly Archives: June 2015

Question: Solutions for synchronization between multiple HL7-repositories?


In the area of using HL7 for patient record storage, there are use cases to involve various sources of patient information who are involved in the care for one patient. For these people, we need to be able to offer a synchronization between multiple HL7-repositories. Are there any implementations of a synchronization engine between HL7 repositories?


There is no single product that provides a solution like this. Typically, a working solution like this involves a great deal of custom business logic, and such solutions are usually solved using a mixture of interface engines, scripting, and bespoke code and services developed in some programming language of choice. See Why use an interface engine?

This is a common problem that has been solved more than once in a variety of ways with a myriad of products.

Here’s an overview of the challenge:

If by synchronization we mean just “replication” from A to B, then A needs to be able to send and B needs to receive messages or service calls. If by synchronization we mean two-way “symmetric” synchronization then you have to add logic to prevent “‘rattling” (where the same event gets triggered back and forth). An integration engine can provide the transformations between DB records and messages, but in general the concept codes and identifiers must still be reconciled between the systems.

For codes, an “interlingua” like SNOMED, LOINC, etc. is helpful if one or both of the systems uses local codes. The participants may implement translations (lookups) to map to the other participant or to the interlingua (it acts as the mediating correlator) The interface engine can call services, or perform the needed lookups. “Semantic” mapping incorporates extra logic for mapping concepts that are divided into their aspects (like LOINC, body system, substance, property, units, etc. Naturally if all participants actually support the interlingua natively the problem goes away. For identifiers, a correlating EMPI at each end can find-or-register patients based on matching rules. If a simplistic matching rule is sufficient and the receiving repository is just a database, then the integration engine alone could map the incoming demographic profile to a query against the patients table and look up the target patient – and add one if it’s new.

But if the target repository has numerous patients, with probabilistic matching rules (to maximize the rate of unattended matches, i.e. not bringing a human registrar into the loop to do merges), then the receiving system should implement a service of some kind (using HL7/OMG IXS standard, OMG PIDS (ref?), or FHIR), and the integration engine can translate the incoming demographic into a find-or-register call to that service. Such a project will of course require some analysis and configuration, but with most interface engines, there will be no need for conventional programming. Rather, you have (or make) trees that describe the message segments, tables, or service calls, and then you map (drag/drop) the corresponding elements from sources to targets.

An MDM or EMPI product worth its salt will implement a probabilistic matching engine and implement a web-callable interface (SOAP or REST) as described. If the participants are organizationally inside the same larger entity (a provider health system), then the larger organization may implement a mediating correlator just like the interlingua for terminology. The “correlating” EMPI assigns master identifiers in response to incoming feeds (carrying local ids) from source systems; Then that EMPI can service “get corresponding ids” requests to support the scenario you describe. An even tighter integration results if one or both participants actually uses that “master” id domain as its patient identifiers.

Here’s some example projects along these lines:

  • dbMotion created a solution that would allow a clinical workstation to access information about a common patient from multiple independent EMRs. It accomplished this by placing an adapter on top of EHR that exposed its data content in a common format (based upon the RIM) that their workstation application was able to query and merge the patient data from all the EMR into a single desktop view. The actual data in the source EHR were never modified in any way. This was implemented in Israel and then replicated in the US one RHIO at a time. (Note: dbMotion has since been acquired by Allscripts)
  • California State Immunization created a solution that facilitated synchronization of patient immunization history across the nine different immunization registries operating within the state. The solution was based upon a family of HL7 v2 messages that enabled each registry to request patient detail from another and use the query result to update its own record. This solution was eventually replaced by converting all the registries to a common technical platform and then creating a central instance of the system that served all of the regional registries in common (so synchronization was no longer an issue now that there was a single database of record, which is much simpler to maintain).
  • LA County IDR is an architecture put in place in Los Angles County to integrate data from the 19+ public health information system both as a means of creating a master database that could be used for synchronization and could be used as a single source to feed data analytics. The Integrated Data Repository was built using a design that was first envisioned as part of the CDC PHIN project. The IDR is a component of the CDC’s National Electronic Disease Surveillance System (NEDSS) implemented in at least 16 state health departments.

The following people helped with this answer: Dave Shaver, Abdul Malik Shakir, Jon Farmer

Profiles and Exceptions to the Rules

One of the key constructs in FHIR is a “profile”. A profile is a statement of how FHIR resources are used for a particular solution – or, how they should be used. The FHIR resources are a general purpose construct, and you can do kind of general purpose things with them, such as store the data in a PHR, and do generally useful display of a clinical record etc.

But if you’re going to do something more specific, then you need to be specific about the contents. Perhaps, for instance, you’re going to write a decision support module that takes in ongoing glucose and HBA1c measurements, and keeps the patient informed about how well they are controlling their diabetes. In order for a patient or an institution to use that decision support module well, the author of the module is going to have to be clear about what are acceptable input measurements – and it’s very likely, unfortunately, that the answer is ‘not all of them’. Conversely, if the clinical record system is going to allow it’s users to hook up decision support modules like this, it’s going to have to be clear about what kind of glucose measurements it might feed to the decision support system.

If both the decision support system and the clinical records system produce profiles, a system administrator might even able to get an automated comparison to see whether they’re compatible. At least, that’s where we’d like to end up.

For now, however, let’s just consider the rules themselves. A clinical record system might find itself in this situation:

  • We can provide a stream of glucose measurements to the decision support system
  • They’ll come from several sources – labs, point of care testing devices, inpatient monitoring systems, and wearables
  • There’s usually one or more intermediary systems between the actual glucose measurement, and the clinical record system (diagnostic systems, bedside care systems, home health systems – this is a rapidly changing space)
  • Each measurement will have one of a few LOINC codes (say, 39480-9: Glucose [Moles/volume] in Venous blood, 41652-9: Glucose [Mass/volume] in Venous blood,
    14743-9: Glucose [Moles/volume] in Capillary blood by Glucometer)
  • the units of measure will be mg/dL or mmol/L
  • there’ll be a numerical value, perhaps with a greater than or less than comparator (e.g. >45mmol/L)

So you can prepare a FHIR profile that says this one way or another. And then a decision support engine can have a feel for what kind of data it might get, and make sure it can handle it all appropriately.

So that’s all fine. But…

Eventually, the integration engineers that actually bring the data into the system discover – by looking at rejected messages (usually) – 1 in a million inbound glucose measurements from the lab contain a text message instead of a numerical value. The message might be “Glucose value to high to determine”. Now what? From a clinical safety perspective, it’s almost certain that the integration engineers won’t replace “too high to determine’ with a “>N” where N is some arbitrarily chosen number – there’s no number they can choose that isn’t wrong. And they won’t be able to get the source system to change their interface either – that would have other knock-on effects for other customers / partners of the source system. Nor can they drop the data from the clinical record – it’s the actual test result. So they’ll find a way to inject that value into the system.

Btw- aside – some of the things that go in this string value could go in Observation.dataAbsentReason, but they’re not coded, and it’s not possible to confidently decide which are missing reasons, and which are ‘text values’. So dataAbsentReason isn’t a solution to this case, though it’s always relevant.

Now the system contains data that doesn’t conform to the profile it claimed to use. What should happen?

  1. The system hides the data and doesn’t let the decision support system see it
  2. The system changes it’s profile to say that it might also send text instead of a number
  3. The system exposes the non-conformant data to the decision support system, but flags that it’s not valid according to it’s own declarations

Neither of these is palatable. I assume that #1 isn’t possible, at least, not as a blanket policy. There’s going to be some clinical safety reason why the value has to be passed on, just the same as the integration engineers passed it on in the first place, so that there’re not liable.

Option #2 is a good system/programmer choice – just tell me what you’re going to do, and don’t beat around the bush. And the system can do this – it can revise the statement ‘there’ll be a numerical value’ to something like ‘there’ll be a numerical value, or some text’. At least this is clear.

Only it creates a problem – now, the consumer of the data knows that they might get a number, or a string. But why might the get a string? what does it mean? Someone does know, somewhere, that the string option is used 1 in a million times, but there’s no way (currently, at least) to say this in the profile – it just says what’s possible, not what’s good, or ideal, or common. If you start considering the impact of data quality on every element – which you’re going to have to do – then you’re going to end up with a profile that’s technically correct but quite non-comunicative about what the data might be, nor one that provides any guidance as to what it should be, so that implementers know what they should do. (and observationally, if you say that it can be a string, then, hey, that’s what the integration engineers will do to, because it’s quicker….)

That’s what leads to the question about option #3: maybe the best thing to do is to leave the profile saying what’s ideal, what’s intended, and let systems flag non-conforming resources with a tag, or wrong elements with an extension? Then the consumer of the information can always check, and ignore it if they want to.

That is, if they know about the flag, and remember. Which means we’d need to define it globally, and the standard itself would have to tell people to check for data that isn’t consistent with it’s claims… and then we’d have to add overrides to say that some rules actually mean what they say, as opposed to not actually meaning that…. it all sounds really messy to me.

Perhaps, the right way to handle this is to have ideal and actual profiles? That would mean an extension to the Conformance resource so you could specify both – but already the interplay between system and use case profiles is not well understood.

I think this area needs further research.

p.s. There’s more than some passing similarity between this case and the game of ‘hot potato‘ I used to play as a kid: ‘who’s going to do have to do something about this bad data’.