One of the more difficult issues facing the NEHTA CTI team is what to choose for a technical CDA packaging strategy. We need a coherent strategy because CDA documents have assorted attachments. These attachments can include things like:
- Images
- Alternative document format representations (such as pdf / rtf)
- Digital Signatures
The CDA specification itself says (section 3) that when a CDA document is transferred from one place to another, all components of a CDA document must be able to be included in a single exchange package, including if the transfer is across a firewall, and that there is no need to change any of the references in the CDA document. In practice, in NEHTA specifications, this leads to the following rules:
- All ED.references to simple images or data must be relative URLs
- The reference name must be a GUID followed by a standard file extension where one exists (i.e. jpg, gif, png, etc)
- The location of the document must be the same location as the attachments
- When a document is moved from one location to another, the attachments must be moved with it
- If an image is too big to be moved around with the document it cannot be part of the attested content and other referencing methods will to be used instead)
- The digital signature(s) is part of the document package
The second rule – GUIDs – is for a different reason, and is discussed below. Example for #2:
<value xsi:type=”ED” mediaType=”image/jpeg”> <reference value=” 0E7AD252-9C55-4499-8F11-75B2F4F4E584.jpg”/> </value>
These rules are fine – and relatively uncontroversial once the full breadth of the NEHTA specifications is understood. (for techie readers, what is above is the abstract CDA package syntax). But what is controversial is the choice of technology to use for the package form (for techies, the concrete syntax).
Broadly speaking, we have the following choices:
Package
- Mime packages
- Zip archives
Metadata
- Implicit in part names (very lightweight)
- Custom light metadata
- The full IHE XDS metadata
1-A is what the v2 standard describes. 2-C is IHE XDM. 1+2-B/1-+2C is what the DIRECT project is using (along with S/MIME)
We’ve been churning internally in NEHTA on what to choose – it’s not an obvious choice. And requirements don’t really seem to help. This is my list of requirements:
- Easy / ubiquitous implementations
- Acceptable to implementers & standards committees
- Can determine from outside the package
- Can be represented as base64 in an xml document for SMD
- Can easily be inserted into an HL7 v2 message
- Can contain multiple xml documents without having to remediate unique ids or escape xml tags
- Doesn’t load up the structure with needless features that cost to implement and aren’t relevant to implementers
- Not overly inefficient (speed over storage requirements)
For me, the single most important requirement of these is that whatever we choose has to be palatable as a solution when it comes forwards to the IT-14 standards committees for approval. But these committees are committees and we get multiple different opinions from influential members, as we do from implementers
Hence this straw poll. Below, after the analysis, in the comments, please vote your preference. Along with your vote, please include your role. Are you a programmer? An analyst? A standards person? Something else?
Thanks
Mime vs Zip
Mime is the classic packaging format used most of all in email (for attachments). It’s also used in some other internet protocols, including html packages by IE, and by the SOAP protocol for attachments (sometimes). It’s also used in the DIRECT project which is widely regarded as the bees knees at the moment.
Mime has problems though:
- It’s common to encounter broken mime packages (any regular email user will have seen them)
- Mime is very flexible – too flexible for this task, which only uses a subset of it’s functionality
- The important development platforms (DotNet, Java, Delphi, 4G, foxpro – list based on my knowledge of what is being used in Australia for clinical systems) don’t have libraries that support mime.
Zip has a different set of pros and con’s. It’s almost ubiquitous for transfer of groups of files between people (though we’re all getting good at renaming file extensions). There are standard zip libraries for zip on many development platforms, including the common languages listed above.
IHE XDM uses .zip format, as does the docx file format. IHE chose it because of problems with consistent support for mime packages, though I’ve also had problems with zip file consistency across the libraries as well.
Zip has problems:
- Zip is very flexible – too flexible for this task, which only uses a subset of its functionality
- Zip doesn’t have any real name-value headers for simple metadata (if that matters, see below)
- The only relevant standard using Zip is XDM – see metadata discussion below
- Zip imposes a fixed compression/decompression cost on the software, even when compression isn’t justified (i.e. it’s not delegated to the hardware) (and I bet some readers thought I was going to call zip compression an advantage
)
So you can see the question with zip is about metadata – but actually, the problem with Mime is kind of the same. And the choice of mime vs zip is sort of dependent on the metadata choice (even though it isn’t strictly technically linked).
Metadata
There are two kinds of metadata that might be in the package. One set of metadata which is pretty much needed is which of the parts is the CDA Document, and which is the digital signature. Those two parts independently point to all the other parts and give them meaning, and so they need to be clearly identified somehow. One way to do this is to just fix the part names (ie. Cda.xml and digsig.xml or something like that). Alternatively you can just add a light XML manifest file that represents these things more explicitly (and it turns out that there is a custom built zip profile with digsig and content support called OPC, used in office documents, which is an ISO Spec, but wasn’t a popular choice).
The other kind of metadata is to reproduce the stuff from inside the CDA document and put it in the metadata – patient, author, document type etc. This is what IHE does in the XDS metadata. The problem with this is the metadata can get quite extensive, and quite onerous. Some of the obligatory metadata may not even be contained in the document (such as practiceTypeCode).
This creates problems with synchronisation – is it wrong for the metadata to disagree with the document? That might be legitimate because the patient’s details have changed between writing the document (and freezing it with a digital signature) and actually submitting the document. But it might also be a mistake – the fields differ in error. What do you do about that?
IHE adds a further problem to this because the metadata is controlled by an affinity domain – which may differ for each XDS repository. Someone has to define the affinity domain. If we aren’t careful, we might end up with multiple different affinity domains specifying different metadata in Australia – which would mean different metadata packages for different repositories – definitely a situation we want to avoid.
Finally, the XDS metadata format is frankly horrifying. CDA is a bit yuck, but that’s nothing compared to the metadata – it’s nearly the worst xml I’ve seen (nothing will catch up with XMI though in the worst XML stakes!).
XDM – Zip + XSD metadata – assumes an affinity domain (the IHE spec is confusing – claims it doesn’t need one, but refers to XDS metadata that does need one). The DIRECT project defines an affinity domain based on the HITSP code sets fixed in C64 (which is US specific). XDM also has the interesting property of allowing multiple different documents in the package and/or the repository (obviously) – as long as there’s no name clashes, which is why one of the rules above is that all names must be GUIDs, so that packages may include multiple documents and their attachments if required.
Making the base package use XDM is great for an XDS-based central repository provider and gateways to that – the metadata they want is already populated (especially if the metadata includes things not in the document). It’s quite onerous for all the perimeter end user systems, who populate it whether the document is going to a repository or not.
So, we can:
- Just use parts with fixed names. (doesn’t scale into XDS – though we’re not sure whether this is an issue yet)
- Define a simple XML form for the a set of metadata that may or not be the XDS metadata
- Just simply bite the bullet and use the XDS metadata, and agree on an affinity domain for all Australia (at least the parts that affect the metadata)
Confused? That’s why we’ve been churning.
We need to pick an approach very soon, and whatever we pick has to be acceptable to the relevant standards committees when they come forward. ETP used OPC-Zip, but this was badly received, and probably not the best described above.
What is the best choice? Vote in the comments. I will approve all non-abusive comments, anonymous or otherwise, but I’d prefer non-anonymous comments. Please indicate in the comments what type of interest you have (implementer, analyst, standards, etc). The results of this straw poll won’t be binding, but will genuinely make a difference. p.s. I understand that many of the interested parties here aren’t allowed to comment on blogs, so I’ll also accept contributions by email at grahame@healthintersections.com.au

A non-Australian comment, FWIW: might as well accept that the XDS metadata is nowadays the defacto standard for document metadata. I looked at the latest English NHS Interoperability Toolkit (ITK) yesterday [ITK release 2 was released yesterday], and it uses XDS-metadata as-is in conjunction with its CDA specifications. Mind you, it doesn’t mandate the use of XDS itself.
The XDS metadata (as-is) contains some ugly kludges (like v2 data types), so I’d personally go for an alternative XML structure which can be transformed into XDS.
ZIP v MIME is an implementation choice best left to technical implementers; the choice of [the level of richness of] metadata should however not be left to implementers, for that’s essentially a question of workflow and architecture, e.g. what one aims to do (right now, and in 10 years) with those documents. I’d go for a rich set of metadata, one can always relax the rules for using a rich set..
Just to clarify, Direct with XDM is an S-MIME package containing ZIP package. From the end-user perspective, what they see is either no package at all, or the ZIP package for Direct.
IHE is working on a slightly less strict set of metadata for Direct right now.
There’s more knowledge about how to access ZIP in the general IT world than there is about MIME, even though the latter is much simpler. Pick two junior engineers. Ask one to correctly build a package of files into ZIP format, and the other to do it in MIME. Check both for correctness. My bet is that if you do this enough times, you’ll find that the stronger platform support for ZIP file creation/access will make it easier for them to use that format, whereas there is usually no immediate platform support that is readily understood for MIME packaging.
If you like rich metadata, and you buy my argument about ZIP over MIME, it would seem that XDM is your best choice.
On the question of Mime vs Zip, I note your comment that the DotNet platform does not have libraries that support Mime. Actually this is not the case. The System.Net.Mime namespace is provided in DotNet 3.5 and above, maybe even in earlier versions.
Bernie Simson
Software Developer
#Bernie thanks for the correction.
#Keith Whoops I’ll read Direct again. As for the junior engineers, if they worked for me they’d be required to go off and find libraries, and it wouldn’t make any difference.
#Rene thanks. This question *is* for technical implementors. I generally find that the argument to pick a simpler metadata is actually more complex in the end – a good and a bad format is worse than just having the bad format.
#Other (notes from email comments):
* Zip does not necessarily impose compression costs (can use no compression)
* UUID names should not have extensions
An implementer from NZ here, with some practical experience of CDA packaging.
I’d say that the first decision to make is which type of transport message to use – if it’s HL7 2.x, then my vote is to go along with the specification and use MIME. It uses a very simple subset of MIME – i.e. a text header and Base64-encoded contents – so there should be very little chance of tools deviating from those 2 standards.
Another thing that helps is having a common piece of software, such as an API or Toolkit available to all parties, that creates and consumes the message! IMHO, this is something NEHTA will have to do if it has any chance of getting vendors to use its standards.
It’s also worth bearing in mind that many Message Service Providers both compress and encrypt messages, so reducing the message size may not be a factor when deciding to use ZIP.
From a developer’s perspective, the learning curve for using MIME is tiny when compared with that for getting to grips with CDA itself! BTW – I am principally a .NET developer and , to the best of my knowledge, the System.Net.Mime namespace in the .NET Framework can only be used for adding attachments to Email Messages – not for creating stand-alone MIME packages usable in other messaging contexts. However, there are plenty of 3rd party libraries, some free, that will do this.
Java has had mime support for quite some time within the javamail library.
I like to think of myself as a programmer but must confess to involvement in standards work as well.
Vote: ZIP, hands down
It’s hard to argue with it when it’s successfully used for MS Office and for Java; clearly up to the task with lots of mature code available (while JavaMail is at version JavaMail 1.4.4 as of Jan 2011). And, as has been mentioned, you don’t have to compress everything.
Not sure what makes something a “relevant standard” in this discussion?
However, I’m not qualified to offer a healthcare-related opinion on the metadata question. All I have is the observation that for “simple name-value metadata” the META-INF/MANIFEST.MF of the Jar file format could be easily adapted/adopted.
As far as I’m aware, the only strong motivation for using MIME is in situations where you want to be able to process the incoming stream incrementally. Perhaps also of interest is the use of MIME types for the contents, but one could easily include such a piece of metadata in a Zip file (OPC does this), rather than relying on file extensions, but even Apple backed away from file types in metadata and shifted to file extensions.
BTW XMI has long long since been “fixed”
This is my personal oppinion, not expressed on behalf of my employer of the moment.
My vote is for ZIP and XDS metadata.
Assuming the PC EHR will use relevant IHE profiles participants will have to deal with XDS metadata and, since I expect to tehre be a single XDS Registry (implying a national affinity domain) I don’t see an issue with pre-defining domain specifics for teh XDS metadata.
A ZIP package can contain a hierarchy of folders and files, which is not the case with MIME packaging. I see an advantage in allowing this to represent a package relationships, name the “folder”/”directory” using a UUID and representing a “package” and allow teh content to be named as might be required, potentially with signature in a {UUID}.sig file and {UUID}.manifest file “at the same level” as the “foldfer. The “folder” with all payload ocmponents woudl be ZIPped, manifest woudl be produced and both would be signed if signature would be required, then the lot would be presented as a MIME package with three parts – mainfext, zipped payload, signature.
I am making this comment form the perspective of how I would like to do this if I were implementing the technology for it.
I am a health informatician involved in standards development who has spent most time in the pathology sector.
I think V2 messaging is going to be with us in Australia for some time and has plenty of value to add yet. Based on our experience so far it takes a long time to standardise! As a general strategy I believe it is better to reduce variation before moving to a new and possibly better choice. Consequently my vote is for mime and XDS.
3 Michaels!
#Michael Legg: Mime + XDS Metadata, thanks.
#Michael Czapski: I’m not sure that XDS is a safe assumption at this time. And we sure wouldn’t want folders, I think. Why would we want them?
#Michael Lawley: There’s lots of options for a custom metafile. I’m not sure what metadata is needed outside the package.
As a technical person I think I would vote ZIP.
My only concerns for ZIP are nailing down in a specification exactly what the ‘version’ of the ZIP format is being used (do you reference the PKWARE spec?, an ISO spec?).
I think it is a bit of a jungle out there with regards some of the newer non-standardised features, and I can easily imagine a junior engineer accidentally switching on some ‘extra’ security in their local ZIP library (hey the more security in health the better right?), and managing to use some bizarre vendor ZIP extension.
Also, I wonder if the 4 GB limit for 32 bit ZIP format might be a limitation that is more likely to be a problem in health care (MRI’s etc). If so, not sure how 64 bit zip file format plays into any specs (the wikipedia page tells me that 64 bit zip format has been supported for ages so it may not be a problem)
With regards metadata, not sure I know enough about the area. If we have a very very specific use case (attaching images and dig signatures to CDA), then to me a lightweight call them ‘cda.xml’, ‘digsig.xml’ works for me. Or a manifest. But this is obviously not very extensible to other use cases.
I worry about replicating metadata out of a CDA header – purely from the perspective that we now have two sources of ‘truth’.
The mhtml format has been specified for CDA in the international documentation so I vote for that. It also has a rfc that describes it see http://tools.ietf.org/html/rfc2557
Hmm.. MHTML is not a bad suggestion! I could live with that. Can I have multiple votes?
Actually, I would be all for MIME but don’t really like the thought of base 64 encoding a binary CDA attachment to generate a package file that might then be base 64 encoded into a transport packet (V2 etc). Seems like a fair bit of waste there – that can’t be completely fixed by transport level compression or anything..
What is the support for binary Content-Transfer-Encoding like in various MIME libraries etc?
Actually, I meant base 64 encoding for transport package like xml for SMD.. I think V2 can actually transport the MIME as is??
#Andrews!
#Patterson: You can have multiple votes but I think they just cancel each other out
. When I pack content into a mime package, I don’t base64 the binary stuff, so no double base64ing – not sure how interoperable that is though, haven’t really tested it. But it’s not actually problematical to double base64, just a bit slow. I think the 4GB limit is quite satisfactory for NEHTA usage. I don’t know what technical limit is going to apply but we can’t allow morons to upload documents bigger than 4GB to the pcEHR
#McIntyre: thanks. Though the CDA spec says that the MIME recommendation is just that – a recommendation
Offline contribution from one of the vendors that’s first off the rank to implement whatever is decided:
We’d like to see an OBX line with a relevant observation identifier (per the MIME examples – OBX|1|ED|18842-5^Discharge Summarization Note^LN|) and then a base64′d opc-zip file in the TX value (ideally in a single OBX rather than splitting it over repeated lines).
Note that XCA allows the use of the rich metadata without imposing the XDS Registry/Repository on data-holding organizations. This has been used to treat very-large hospital organizations as a ‘community’ while not forcing them to re-invent their EHR. I don’t think this helps with your prime questions, but it does address some concerns.
The XDS Metadata is not perfectly CDA, because not all of healthcare starts from a CDA document. It is great if you do, but not everything is CDA based. Especially historically. Other cases are when the prime document is an imaging document (XDS-I) thus the prime document is a DICOM SR object.
Much of the Metadata is very valuable for searching, but even more important to support Privacy Policies. Cases where patient wants to hide or specifically authorize data by date/time ranges, authoring organizations, type of clinical data, etc.
To add to your MIME/ZIP discussion… the latest supplement out of IHE deals with some gaps in document encryption. It has two solutions to different gaps. One gap is encrypting XDM on media, the other is encrypting a standalone document independent of the transport. In both cases CMS was picked, the underlying security layer in S/MIME. Thus for XDM, one might first find a CMS wrapper around a ZIP including metadata and files. Needed to go this way because of the aforementioned ZIP incompatibilities in advanced features beyond the ISO-8859
cut-paste failed me. what was posted as “ISO-8859″ should have been the base PKWARE appnote… ZIP is defined by the PKWARE ‘appnote’… sorry.
I am an architect/developer involved in some Wave sites. I am also involved in standards committees.
Just the other day I got notification of another Java MIME package: http://james.apache.org/mime4j/
First I would like to note some of the use cases behind this (as far as I know). For the Wave sites it is necessary for CDA documents to be transported both point to point and to a repository (SEHR). This transport requires new capabilities in desktop vendor software. Since some desktop vendors will be adding SMD support to their application, it was suggested that SMD could be used for both the transports above. It was further recommended that the packaging of the CDA be identical for these use cases. So what we are talking about here is finding a simple way to get CDA delivered both point to point and to repositories. The development time frames for all of this are compressed.
With this in mind I would go for the simplest solution that achieves these goals. If there is a requirment to transport the documents via XDR/XDS then a gateway can be created to transform to the required format.
As a developer I think there is more experience with ZIP than MIME and so would choose this. I would also go for the simplest metadata, that is none.
For the point of view of the discussion SMIME=MHTML and it is the HL7 Recommendation, works well with V2 and is designed to handle url references. Its also widely supported.
I was introduced to this recommendation in the CDA tutorials. I can’t see a reason to go against it for Australia, and particularly when we plan to transport CDA (if it happens) in V2
MHTML appears to satisfy my simplicity criteria. It seems particularly easy to use when serving up CDA documents with attachments and signatures.
MIME for packaging it works. XDS metadata is ok (any other options? roll your own national solution?)
Now for some assumptions/queries…
a) packaging is not required if service category does not require multipart content?
b) metadata is only required for document repository interactions?
c) signatures only for service categories that have requirements?
So service category requirements define need for packaging, metadata and/or signatures?
#Brett, ta. Packaging might not be required if there’s no multipart content – but who knows the answer to that in advance? Metadata might be required for other interactions. Middleware, for instance. Signatures – it seems to me that sometimes we might say that signatures are always required, but when could we say, “no signatures allowed” – so I think that it’s best to always have the basic packaging structure.
I get what you mean – so packaging is always part of the stack… I actually quite like that idea; makes it really clear what one is going to deal with all of the time; and expect an optional/mandatory content signature would always be relevant in any use case.
ps. I am loving the forum; this from my perspective is a most useful interactive discussion to understand some alternative perspectives on a standards issue. Can we do standards 2.0 now?
Thanks Brett. What would standards 2 look like?
All.
Voting running with a preference for ZIP + XDS metadata = XDM. We’re wondering whether to say that the html index stuff is not required or not – that seems like a reasonable deviation to me.
Off-topic: I would like to think ‘Standards 2.0′ would be a crowdsourcing activity around issues that might end up with some ‘official’ poll around suggested solutions. I think it would lend itself to getting input from the community of implementer parties who can’t get to committee meetings at your average SDO. This might well be held in conjunction with NEHTA work program as an efficient and broad public consultation approach. I know you are much more likely to get my attention 30 mins a day rather than one day every couple of months or a week 3 times ayear.
In other words – lets keep on having this sort of discussion.
As non-Australian comment. From technical point of view, the obvious choice for the packaging is simply and definitely MIME.
My first point is that MIME is simple and widely used and available.
My second point is that in modern web service implementation, W3C standard MTOM (http://www.w3.org/TR/soap12-mtom/) also uses MIME to support binary attachment. In fact this is the only W3C standard that will ensure the web service interoperability between Java technology and Microsoft NET technology, and MTOM implementation is straightforward, nearly out of box with JAX-WS in Java technology (see example http://www.mkyong.com/webservices/jax-ws/jax-ws-attachment-with-mtom/).
So have another thought before you really decide to go for ZIP.
thanks for the comment. It’s true that MTOM uses MIME – but MTOM is a transport packaging solution. Using MTOM the package includes the SOAP details. This turns out to be quite problematic.