Category Archives: Security

#FHIR Connectathon 7, Chicago Sept 13-14

We will be holding the next FHIR connectathon in Chicago on Sept 13/14 associated with the HL7 Plenary Meeting. Once again, anyone interested in implementing FHIR is welcome to attend.

This connectathon, we have 4 tracks (we’re expanding!):

  1. Our standard introductory Patient Track. This is generally the one that’s appropriate for first time attendees
  2. A new track focusing on the conformance resources (Profile and ValueSet), testing interoperability between the various tools and users that are starting to see these used in anger now
  3. Our standard experimental track, focusing on the cutting edge. This time, we’re doing something quite different: we’re going to be testing a basic SMART on FHIR EHR/plug-in scenario
  4. A joint track with DICOM exploring interoperability between the DICOM RESTful services and the FHIR API

What’s really exciting about these is that we’re starting to move into real world interoperability contexts now, focusing more on how FHIR is used in practice than exploring areas of the standard.

There’s full details about the connectathon on the HL7 wiki, with registration details.

I’ll look forward to seeing you there!

#FHIR + Open ID Connect

I’ve upgraded my FHIR server to support OpenID Connect tokens as part of it’s OAuth based login. This is part of implementing IHE’s IUA profile, though I’m not yet sure whether I’m going to finish that off – I’m still discussing the way that works with the IHE authors.

What this means is that as part of the server login process, my server provides a signed openID Connect token with user identification included in it. There’s now a standard Open ID Connect discovery document available here. Josh Mandel and I would like to make it standard for any FHIR server that uses OAuth, that the URL

[fhir-base]/.well-known/openid-configuration

either returns an OIDC discovery document describing the OAuth part of the server, or redirects to the appropriate location (OIDC puts it on the root of the server, so that would be best place for it).

I added this to help understand how hard it is to support the tokens, since they’re part of the Smart-On-FHIR framework. And the answer’s kind of split – the JSON/identification bits are relatively straight-forward (so far). But the crypto bits – they’re screamingly hard. Because my server is written using Delphi, my choice of crypto libraries is somewhat limited – especially since my code is open source – and in the end I had to go with openSSL. It’s astounding how hard this crypto stuff is. At least half the problem is because of openSSL (see, “openSSL is written by monkeys” – my experience matched that). But it’s not all openSSL – basic crypto is harder than it should be – a myriad of closely related counter-intuitive terms, barely differentiated but incompatible file formats, conflicting or wrong advice, etc. And then it either works, or it doesn’t, and mostly all you get is “didn’t work” as an error (e.g. the signature didn’t verify). But I got nearly there in the end (I still have to work though the hard bits of this) (and thanks to Josh for support).

Anyway, the upshot of all this is the first (as far as I know) open source Delphi implementation of JSON Web Tokens and their associated stack of things (JWK, JWS). This might be useful for other delphi programmers, so:

  • JWT.pas – the core implementation of JSON Web Tokens
  • JSON.pas – the json library it uses
  • libeay32.pas – extensions to the standard Indy OpenSSL headers (the next version of delphi will include these in the standard header file; these are extensions to the XE5 version)

And there’s some tests too. Also, there’s a bunch of other library stuff in Github there that those units depend on.

Double Layer OAuth

At the last connectathon, we had two servers offering OAuth based logins. They took two totally different approaches:

  • My server used OAuth to delegate authentication to public/commercial identity manager services such as Google or Facebook (and HL7 too). Once authorised, users had open access to the FHIR API
  • Josh Mandel’s server used OAuth to let the user choose what kind of operations the user could do on the FHIR API, but users were managed by the server

The way my server works is to ask the OAuth provider (Google or Facebook) to access the user’s identity. In order to give me that permission, the server has to identity the user (and I have to trust them to do that correctly). The problem with implementing like this is that while I identify the user, I’m not offering the user any choice as to what access their user agent (e.g. web site or mobile application) will have to the resources my server protect – which is their medical record (well, their prototype medical record anyway).

Josh’s server does that – it asks you what parts of your medical record you want to make visible to your user-agent, and whether you want to let it change any parts of it. So that’s much better (aside: There’s a number of reasons why you might want to limit access, but there’s also some real risks to that as well. These are well known problems I’m not taking up here). The problem with the approach Josh’s server takes is that you cannot delegate identity management, and managing user identity is hard and expensive.

So, can you have both? Well, it turns out that you can, but it’s pretty hard to visualise how that works. Josh explains how the process works here:

Josh made that video specifically to explain to me how the process works (thanks). So I’ve gone ahead and implemented this based on Josh’s explanation. You can check out how it works here: https://fhir.healthintersections.com.au/closed. In this example, the web server acts as a client to it’s own underlying RESTful API. Initially, you have to login:

oauth_login

Once you choose to login using OAuth, you get this page:

oauth_id_page

 

You can identify yourself using 3 different external providers, or you can login using a username/password (contact me directly to get an account). Or, if you know me via Skype you can authenticate to me out-of-band using the token. Once you’ve identified yourself, then you get asked what permissions to give your user-agent:

oauth-choice

For now, this is just a place holder while we decide (argue) about what kind of permissions would be appropriate. The only right that is currently meaningful is whether to share your user details (internal id – which might be a facebook or google internal id, user name, and email address). If you share them, the user agent can use them; if you don’t the user agent won’t be able to recognise you as the same user again.

I wrote this to demonstrate the process, since it’s not easy to visualise it. Developers who want to use this can consult my API documentation (it’s as close to Google as I can get; there’s no JWT token, but I will be implementing IHE IUA later). For developer’s who want to try this out locally, there’s a predefined client you can use:

client_id = LocalTest
client_secret = SecretForLocalTest
name = Local Test Client
redirect = http://localhost:8000/app/index.html

Contact me if you the redirect doesn’t suit, and I can add it to the list, or set up a different client for you.

Btw, in case anyone wants to see the source – this is all open source.

 

 

 

New CDA Stylesheet

From Keith’s blog:

A few weeks ago, Josh Mandel identified problems with the sample CDA Stylesheet that is released with C-CDA Release 1.1 and several other editions of the CDA Standard and various implementation guides. There are some 30 variants of the stylesheet in various HL7 standards and guides.  A patched version of the file has been created and is now available from HL7’s GForge site here.

Note for Australian users: if you use the NEHTA authored stylesheet, this release doesn’t affect you.

SQL Injection attacks against HL7 interfaces

One of the questions that several people have asked me in the discussions triggered by the report of CDA vulnerabilities is whether there is any concern about SQL injection attacks in CDA usage, or other HL7 exchange protocols.

The answer is both yes and no.

Healthcare Applications are subject to SQL injection attacks.

An SQL injection attack is always possible anywhere that a programmer takes input from a user, and constructs an SQL statement by appending it into a string, like this:

connection.sql = “select * from a-table where key = ‘”+value_user_entered+”‘”;

There is any number of equivalents in programming languages, all variations on this theme. If the value the user entered includes the character ‘, then this will cause an error. It may also cause additional sql to run, as memorably captured in this XKCD cartoon:

 Exploits of a Mum

Exploits of a Mum

There’s several ways to prevent SQL injection attacks:

  • check (sanitise) the inputs so that SQL can’t be embedded in the string
  • use parameterised SQL statements
  • escape the SQL parameters (connection.sql = “select * from a-table where key = ‘”+sql_escape(value_user_entered)+”‘”);
  • ensure that the underlying db layer will only execute a single sql command

Doing all of these is best, and the last is the least reliable (cause it’s the easiest). Because all of these actions are not as convenient as the vulnerable approach, SQL injection attacks continue to be depressingly common. And that’s on web sites.

My experience is that healthcare interfaces tend to be written very differently to websites that are going to run in public. They are not written expecting hostile users – by and large, they don’t, either. And they are always written to a budget, based on estimates that don’t include any allowance for security concerns. That’s right – the only time I have seen security feature in interface costings is for the PCEHR.

So I assume that the default state is that healthcare interfaces are potentially subject to SQL injection attacks due to programmer efficiency (anyone who calls it ‘laziness’ is someone who’s never had to pay for programming to be done).

Healthcare Applications are not so subject to SQL injection attacks.

However, in practice, it turns out that it’s not quite such a concern as I just made it show. That’s for several reasons.

I was specifically asked about CDA documents and SQL injection. Well, CDA contents are rarely processed into databases. However the XDS headers that are used when CDA documents are exchanged are often processed into databases, and these are usually extracted from the CDA contents. So any XDS implementation is a concern. The PCEHR is an XDS implementation, though programmers never needed to wonder whether it sanitized it’s inputs. A side lesson from the PCEHR story is that name fields are not generally subject to SQL injection attacks, since it’s not too long before a name with an apostrophe trips them over if they are (and that’s true on more than just healthcare interfaces).

Really, the context where SQL Injection attacks are most likely is in the context of an HL7 v2 interface.

Some healthcare applications sanitize their inputs. But validation has it’s challenges.

However the real reason that healthcare applications are not so subject to SQL injection attacks is operational. An SQL injection attack is iterative. I think this is one of the best demonstrations of how to do one:

This an ASP.NET error and other frameworks have similar paradigms but the important thing is that the error message is disclosing information about the internal implementation, namely that there is no column called “x”. Why is this important? It’s fundamentally important because once you establish that an app is leaking SQL exceptions

The key thing here is that a general pre-requisite for enabling SQL injection attack is to leak interesting information in errors, and then the attacker can exploit that iteratively. Well, we’re safe there, because interfaces rarely return coherent errors 😉

Actually, that’s not really true. It’s easy, if you can craft inputs to the interface, and capture outputs, to iterate, even if you don’t get good information. Mostly, typical users don’t get such access; they can enter exploit strings into an application, but the error messages they might generate downstream will go into some log somewhere.

That means that the interface error logs need protecting, as does access to the interfaces (in fact, this is the most common security approach for internal institutional interfaces – to restrict the IP addresses from which they can be connected to)

Generally, if you have access to the system logs and the interfaces, you’re very likely to have access to the databases anyway. Hence, SQL injection attacks aren’t such a problem.

But really, that’s pretty cold comfort, specially given that many attacks are made by insiders with time and existing access on their side. I’m sure that it’s only a matter of time till there’s a real world exploit from SQL injection.

Security cases for EHR systems

Well, security is the flavor of the week. And one thing we can say for sure is that many application authors and many healthcare users do not care about security on the grounds that malicious behaviour is not expected behaviour. One example that sticks in my mind is one of the major teaching hospitals in Australia that constantly had a few keys on the keyboard wear out early: the doctors had discovered that the password “963.” was valid, and could be entered by running your finger down the numeric keypad, so they all used this password.

So here’s some examples of real malicious behaviour:

Retrospectively Altering Patient Notes

From Doctors in the Dock:

A Queensland-based GP retrospectively changed his notes to falsely claim that a patient refused to go to hospital – then submitted those notes as evidence in an inquest, has been found to have committed professional misconduct and been suspended from practising medicine for 12 months including six months plus another six if he breached conditions within three years – and banned him from conducting most surgical procedures except under supervision for at least a year.

This is very similar to a case here in Melbourne more than a decade ago, where a doctor missed a pathology report that indicated cancer was re-occurring in a patient in remission. By the time the patient returned for another check-up, it was too late, and she died a few weeks later. The doctor paid an IT student to go into the database and alter the report to support his attempt to place blame on the pathology service. Unfortunately for him, the student missed the audit trail, and the lab was able to show that it wasn’t their fault. (I couldn’t find any web references to it, but it was big news at the time).

Both these involve highly motivated insider attacks. As does the next one, which is my own personal story.

Altering Diagnostic Reports

Late one evening, in maybe about 2001, I had a phone call from the owner of the company I used to work, and he told me that the clinical records application for which I was lead programmer had distributed some laboratory reports with the wrong patients data on them. This, of course, would be a significant disaster and we had considerable defense in depth against this occurring. But since it had, the customer – a major teaching hospital laboratory service – had a crisis of confidence in the system, and I was to drop everything and make my way immediately to their site, where I would be shown the erroneous reports.

Indeed, I was shown a set of reports – photocopies of originals, which had been collected out of a research trial record set, where what was printed on the paper did not match what was currently in the system. For nearly 48 hours, I investigated – prowled the history records, the system logs, the databases, the various different code bases, did joint code reviews, anything. I hardly slept… and I came up with nothing: I had no idea how this could be. We had other people in the hospital doing spot reviews – nothing. Eventually I made my way to the professor running the trial from which the records came, and summarised my findings: nothing. Fail. As I was leaving his office, he asked me casually, “had I considered fraud”? Well, no…

Once I was awake again, I sat and looked afresh at the reports, and I noticed that all the wrong data on the set of reports I had were all internally shuffled. There was no data from other reports. So then I asked for information about the trial: it was a multi-center prospective interventional study of an anti-hyperlipideamic drug. And it wasn’t double-blind, because the side-effects were too obvious to ignore. Once I had the patient list, and I compared real and reported results, I had a very clear pattern: the fake result showed that patients on the drug showed a decrease in cholesterol readings, and patients that were on placebo didn’t. The real results in the lab system showed no significant change in either direction for the cohort, but significant changes up and down for individual patients). For other centers in the trial, there was no significant effect of the drug at the cohort level.

So there evidence was clear: someone who knew which patient was on drug or placebo had been altering the results to meet their own expectations of a successful treatment (btw, this medication is a successful one, in routine clinical use now, so maybe it was the cohort, the protocol or dosage, but it wasn’t effective in this study). How were the reports being altered? Well, the process was that they’d print the reports off the system, and then fax them to the central site coordinating the trial for collation and storage. Obviously, then, the altering happened between printing and faxing, and once I got to this point, I was able to show faint crease marks on the faxes, and even pixel offsets, where the various reports had literally been cut and pasted together.

I was enormously relieved for myself and my company, since that meant that we were off the hook, but I felt sorry for the other researchers involved in the trial, since the professor in charge cancelled the whole trial (inevitable, given that the protocol had been violated by an over eager but dishonest trial nurse).

What lessons are there to be learnt from all this? I know this a tiny case size (3), but:

  • Many/most attacks on a medical record system will come from insiders, who may be very highly motivated indeed, and will typically have legitimate access to the system
  • In all cases, the audit trail was critical. Products need a good, solid, reliable, and well armoured audit log. Digital signatures where the vendor holds the private key are a good idea
  • Digital signatures on content and other IT controls can easily be subverted by going with manual work arounds. Until paper goes away, this will continue to be a challenge

If any readers have other cases of actual real world attacks on medical record systems, please contribute them in the comments, or to me directly. I’m happy to publish them anonymously

Further Analysis of CDA vulnerabilities

This is a follow up to my previous post about the CDA associated vulnerabilities, based on what’s been learnt and what questions have been asked.

Vulnerability #1: Unsanitized nonXMLBody/text/reference/@value can execute JavaScript

PCEHR status: current approved software doesn’t do this. Current schematrons don’t allow this. CDA documents that systems receive from the PCEHR will not include nonXmlBody.

General status: CDA documents that systems get from anywhere else might. In fact, you might get HTML to display from multiple sources, and how sure are you that the source and the channel can’t be hacked? This doesn’t have to involve CDA either – AS 4700.1 includes a way to put XHTML in a v2 segment, and there’s any number of other trading arrangements I’ve seen that exchange HTML.

So what can you do?

  • Scan incoming html to prevent active content in the HTML. (schematrons, or use a modified schema)
  • don’t view incoming html in the clinical system – use the user’s standard sandbox (e.g. the browsers)
  • change the protocol to not exchange raw html directly

Yeah, I know that this advice is wildly impractical. The privilege of the system architect is to balance between what has to be done, and what can’t be done 😉

FHIR note: FHIR exchanges raw HTML like this. We said – for this reason exactly – that no active content is allowed. We’re going to release tightened up schema and schematron, and the public test servers are tightening up on this.

Vulnerability #2: Unsanitized table/@onmouseover can execute JavaScript

PCEHR status: these documents cannot be uploaded to the pcEHR, are not contained in the PCEHR, and the usual PCEHR stylesheet is unaffected anyway.

General status: CDA documents that you get from anywhere else might include these attributes. If the system isn’t using the PCEHR stylesheet, then it might be susceptible.  Note: this also need not involve CDA. Anytime you get an XML format that will be transformed to HTML for display, there might be ways to craft the input XML to produce active content – though I don’t know of any other Australian standard that works this way in the healthcare space

So what can you do?

Vulnerability #3: External references, including images

PCEHR status: There’s no approved usage of external references in linkHtml or referenceMultimedia, but use in hand written narrative can’t be ruled out. Displaying systems must ensure that this is safe. There will be further policy decisions with regard to the traceability that external references provide.

General Status: any content that you get from other systems may include images or references to content held on external servers, whether CDA, HTML, HL7 v2, or even PDF. If you are careless with the way you present the view, significant information might leak to the external server, up to and including the users password, or a system level authorization token, or the user’s identity. And no matter how careful you are, you cannot prevent some information leaking to the server – the users network address, and the URL of the image/reference. A malicious system could use this to track the use of the document it authored – but on the other hand, this may also be appropriate behaviour.

So what can you do?

  • Never put authentication information in the URL that is used to initiate display of clinical information (e.g. internally as a application library)
  • Never let the display library make the network call automatically – make the request in the application directly, and control the information that goes onto the wire explicitly
  • If the user clicks an external link, make it come up in their standard browser (using the ShellExec on windows or equivalent), so whatever happens doesn’t happen where the code has access to the clinical systems
  • The user should be warned about the difference between known safe and unknown content – but be careful, don’t nag them (as the legal people will want, every time; but soon the users click the warning by reflex, and won’t even know what it says)

Final note: this is being marketed as a CDA exploit. But it’s an exploit related to the ubiquity of HTML and controls, and it’s going to be more and more common…

Update – General mitigating / risk minimization approaches

John Moehrke points out that there’s a series of general foundations that all applications should be using, which mitigate the likelihood of problems, and or the damage that can be caused.

Authentication

Always know who the other party (parties) your code is communicating with, establish the identity well, and ensure communications with them are secure. If the party is a user using the application directly, then securing communications to them isn’t hard – then the focus is on login. But my experience is that systems overlook authenticating other systems that they communicate with, even if they encrypt the communications – which makes the encryption largely wasted (see “man in the middle“). Authenticating your trading partners properly makes it much harder to introduce malicious content (and is the foundation on which the PCEHR responses above rest on). Note, though, the steady drum of media complaints about the administrative consequences of properly authenticating the systems the PCEHR interacts with – properly authenticating systems is an administrative burden, which is why it’s often not done.

Authorization

A fundamental part of application design is properly managed authorization, and to do so throughout the application. For instance, don’t assume that you can enforce all proper access control by managing what widgets are visible or enabled in the UI; eventually additional paths to execute functionality will need to be provided, in order to support some kind of workflow integration/management. Making the privileges explicit in operational code is much safer. And means that rogue code running the UI context doesn’t have unrestricted access to the system (though a hard break like between client/server is required to really make that work)

Audit logging

Log everything. With good metadata. Then, when there is belief that the system is penetrated, you can actually know whether it is or not. Make sure that the security on the system logs is particularly strong (no point keeping them, but making it easy for the malicious code to delete them). If nothing else, this will help trace an attacker, and prevent them from making the same attack again because no one can figure out what they did

Note: none of this is healthcare specific. It’s all just standard application design, but my experience is that a lot of systems in healthcare aren’t particularly armored against assault because it just doesn’t happen much. But it’s still a real concern.

Ineroperability and Safety: Testing your healthcare integration

John Moehrke has a new post up about the importance of testing:

Testing is is critical both early and often. Please learn from others failures. The Apple “goto fail”provides a chance to learn testing governance lesson. It is just one in a long line of failures that one can learn the following governance lesson from. Learning these lessons is more than just knowing the right thing to do, but also putting it into practice.

Starting with the Apple Goto Fail: I was personally astounded that this bug existed for so long. John notes that this is an open-source library, though I think of this as “published source” not “open source”. And btw, NSA, thanks for letting Apple know about the bug when you found out about it – I’d hate to think that you preferred for us all to be insecure…

Anyway, the key thing for me is, why isn’t this tested? Surely such a key library on which so much depends, it’s surely tested every which way until it’s guaranteed to be correct?  Well, no, and it’s not the only security library that has problems – even very similar ones. Though it’s probably properly tested properly by Apple now – or soon anyway (in fact, I figure that’s probably how they found the issue).

The interesting thing about this is how hard this bug is to test for – because it’s actually a bug in the interoperability code. Testing this code in a set of functional tests isn’t exactly hard, it just needs a custom written test server against which to test all the myriad variations for which it must be tested. That is, it’s not hard, it’s just a huge amount of work – and it’s work that programmers hate doing because it’s repetitive variations with little useful functionality, and terribly hard to keep the details straight.

Well, we can poke fun at Apple all we like, but the reality is that interfaces that integrate between different products are rarely tested in any meaningful sense. The only ongoing healthcare interoperability testing I know about that regularly happens in Australia is that the larger laboratory companies maintain suites of GP products so that they can check that their electronic reports appear correctly in them. Beyond that, I’ve not seen any testing at all. (well, of course, we always sometimes to acceptance testing when the interface is first installed).

Interface engines typically include features to allow the integration code – the transforms that run in the interface engine – to be tested, and these things often are tested. But I’m not aware of them providing framework support for running tests that test the integration between two products that exchange information over the interface engine. Testing this kind of integration is actually really hard, because effective test cases are actually real world test cases – they test real world processes, and they need business level support.

And just like programmers have discovered: maintaining test cases is a lot of work, a lot of overhead. It’s the same for organizations doing business level test cases. What I do see almost everywhere is that production systems contain test records (Donald Duck is a favourite name for this in Australia) that allow people to create test cases in the production system; most staff automatically recognise the standard test patients and ignore references to them in worklists, etc. Interestingly, the pcEHR has no such arrangement, and end-users find that a very difficult aspect of the pcEHR – how do they find out how things work? Sure, they can read the doco, but that doesn’t contain the detail they need. In practice, many of the users use their own patient record for experimentation, and I wonder how many of the 15000 real clinical documents the system contains are actually test records against the author’s own patient record.

HL7 v2 contains a field in which “test” messages can be flagged. It’s not intended for use against the test records I discussed in the previous paragraph, but to indicate that the message is intended for test purposes. The field is MSH-11, and in v2.7 it has the following values:

D 1 Debugging
P 2 Production
T 3 Training

I’ve never seen this field used – instead, the test system – in the few cases where it exists – thinks it’s the production system, it’s just a different address, or maybe on an entirely physically separate network. So there’s no equivalent for this in CDA/FHIR.

So, real testing is hard. We’ll continue to get exposed by this, though sometimes it’s simply cheaper to pick up the pieces than prevent accidents – the degree to which that’s true depends on the level of human fall back systems have, but that’s gradually reducing, and it’s really hard to quantify. I don’t think we’re “safe” in this area at all.

What do you think? I’d welcome comments that describe actual testing regimes, or war stories about failures that happened due to lack of testing… I have more than a few myself.