How do I represent end of lines in text content in HL7 v2?

This is a fairly common question because the HL7 v2 standard doesn’t actually describe how to do this explicitly. That’s because there’s several options. But first, background:

An HL7 message consists of a series of “segments”, separated by the ASCII character 13 or 0x0D (sometimes in practice you also see 0x0A, or more comonly 0x0D0A, though these are not conformant). The segments are broken up into fields, components etc by special syntax characters (usually |, ^, & etc), and for each of these, there is an escape sequence for representing the special character if it’s actually part of the proper contents of the field. (note, btw, that some implementations don’t escape these characters in some types such as IS, because of a lack of clarity in old versions of the v2 specification, but this is wrong).

However there’s no escape character for #13 – why not?

The first part of the answer is that line breaks aren’t usually part of a proper field value, except when the content is narrative text (i.e. word processor content). So there’s no expectation that the line break characters will appear in normal fields. Word processor text is represented by fields with two particular data types: TX and FT

TX is plain text. Generally, TX fields repeat, and the expectation is that you would have a repeat (usually separated using the ~ character) between each line (see, for instance, section 2.A.78 or v2.6 chapter 2). However not all TX fields repeat. In these cases, the expectation really is that there won’t be multiple lines, but that’s just an expectation, not a rule. The way to represent a line break, if you have to, is to use the \X..\ sequence.

The \X..\ sequence introduces a series of bytes defined by their hexadecimal notation, such as \X0D0A\. The standard carefully defines these as “bytes” not characters, and says that interpretation is up to local agreement. But it’s pretty much universal practice in my experience that a short sequence with some combination of the values 0D and 0A is interpreted as a line break. And, in fact, this sequence can be used in any ST data type as well, including e.g. names, but it’s pretty likely that most receiving systems will reject names like that, or otherwise behave badly.

FT is rich text, and behaves slightly differently. In FT, there’s a special escape sequence \.br\ which means “Begin new output line”. This isn’t actually the same thing as a line break – it’s really a terminal instruction from the grand old days of yore (v2 really is old), but in my experience it’s universally mapped to a line break now. I’ve used “line-break” with it’s ambiguous meaning deliberately, btw. If you want to be sure which particular meaning – a short line break, or a paragraph break, you’ll have to discuss that with each particular application.

(This from a question first asked on anHL7 mailing list)

6 Comments

  1. Rene Spronk says:

    Effectively \.br\ is (effectively) technology agnostic [“line break”], and as such it is to be preferred above escaping ASCII 13 in some way.

    Instead of going for an explicit line break, some systems (as was noted on the HL7 list) use ~ (field repetition), others go for repeating segments. However, as far as I’m aware it doesn’t state anywhere in the standard that a new repitition of a field or segment in any way implies a line break. So this kind of approach would require a site specific agreement.

    \br.\ is the most consistent approach – it’s certainly one that is described in the standard itself.

  2. I’ve see pretty consistent use of ~ to terminate a line at the right line width, usually somewhere between 65 and 80 characters, and a double ~~ to mark the “end of a paragraph”. This is especially the case in transcription.

    It just goes to show that the “right” way, and what people usually do aren’t necessarily the same. In this case, simplicity wins over “correctness”.

    There are also systems that send the content in RTF or in PDF where this isn’t an issue, or even better these days: CDA.

    • Nyerguds says:

      ‘~’ is usually used as the repeating field character in HL7 v2.0 though. You can’t just assume your string isn’t part of a repeating field already.

  3. Grahame Grieve says:

    Rene – \.br\ is definitely preferred, but isn’t defined for TX data type. I would think that support for it in TX in receiving systems would be pretty variable. There is specific comment for TX about repeats being new lines.

    Keith: There are indeed systems that send RTF, PDF or HTML, but this is usually for representing the message or sections of it, not just fields.

  4. I’m thinking about MDM messages where the need for line break handling occurs pretty often, as does use of RTF/PDF and CDA.

  5. Anthony Julian says:

    I would welcome a proposal for a fix, but IMHO the standard is clear:
    1. In a field, component, or sub-component of type TX, FT, or CF the line break MAY be indicated by the escape sequence \X0D\ or \X0D0A\
    2. (in addition) In a field, component, or sub-component of type FT the line break MAY be transmitted by using the Formatted text escape sequence \.br\

    RTF already has a line break \par

Leave a Reply

Your email address will not be published. Required fields are marked *

question razz sad evil exclaim smile redface biggrin surprised eek confused cool lol mad twisted rolleyes wink idea arrow neutral cry mrgreen

*

%d bloggers like this: