Design by Constraint – not as useful as people think (#4)

This post is the last part of the Design by Constraint series (first post)

In the last 3 posts, I’ve described “Design By Constraint”, and pointed out that one inevitable outcome of design by constraint is that there will be transforms everywhere.

And I said:

The inevitable outcome of Design By Constraint is implementation chaos

Before I go on to explain why I think that, I just want to recap on some of the uses of Design by Constraint:

HL7 v3

  • Reference Model is The RIM + Data Types + Structural Vocab (Note that the RIM has some additional patterns that are orthogonal to Design By Contract, particularly the Act based grammar, and the structural vocabulary layer. I’ll take them up in other posts)
  • Static models as constraints. The static models are constraint specifications masquerading as constrained models – but they are syntactically equivalent to ADL. Static models may derive from other static models and in effect stack up
  • The actual transform to constrained class model is executed by the schema generator, which produces schema that are in effect constrained class models
  • In addition, there is the RIM serialisation, which is the normative XML form for the reference model

openEHR

  • The Reference model is “the reference model”. The reference model is both more concrete and more abstract than the RIM – it explicitly represents a logical EHR, and then is has open clusters/elements for actual data that carry no semantics at all
  • There is a canonical XML representation for instances of data that conform to the the reference model
  • ADL is used to describe constraints in the reference model in “archetypes”. The archetypes can derive from each other and stack up.
  • There is also the template layer that uses ADL to describe constraints that are applied as transforms to produce constrained class models that are represented as schema. I’m sure they could also be represented as UML PIMs, just like HL7 could choose to do as well

CDA

  • The reference model is the CDA schema. No actual UML diagram is widely distributed, though one could be defined (and I think I’ve seen one, though there are some distinct challenges in the data types)
  • The constraints are published as implementation guides with english language statements and schematron constraints.
  • The wire format is the reference model – the single CDA format
  • There is intense interest in using “greenCDA” – these are in effect constrained class models after applying a transform based on constraints

BRIDG

  • In the beginning BRIDG was meant to be  PIM, and constraints on the BRIDG were not expected to happen
  • Now BRIDG is starting to be seen more and more as a conceptual model, which is constrained for particular use (at least, that’s what I see in private communications with CDISC/NCI people)
  • As soon as someone says, let’s make these actual uses formal constraints on the BRIDG model, then the BRIDG eco-system will fully conform to “Design by Contract”

That’s enough for now. These systems are all similar in concept. But they differ wildly i:

  • variations in the description and presentation of the overall approach
  • variations in the details of how things are actually done
  • variations in the choices of technologies for the different pieces
  • focus and balance of the community that adopt them

But in spite of this, in spite of the fact that these variations mean the commonalities are not recognised, they are all variations on the one theme. And they all suffer from the problem of engineering the solution: whatever you do, you have to transform from general to specific, and back.

Implementation Disasters

The inevitable outcome of wide scale adoption of this technique is chaos. Different implementors want to live at different points on the general <-> specific curve, and there’s a variety of options to attempt to externalise costs from one implementor to another.

There’s various approaches to handling this. You can be like HL7: put your head in the sand, claim that it all works (in other words, externalise the costs), and then be real confused about why your brilliant idea isn’t actually solving all the problems in the world.

And this happens precisely because if you buy into the whole notion – learn the ways of the master model, and build a specific software stack to deal with the products – then it actually works pretty well. Please note this: If you embrace the model, the outcomes are solid. And there are many people doing so (HL7 JavaSIG/RIMBAA particularly). (or, to express it differently, if you invest where the costs have been externalised to, then you’ll eventually benefit from the savings that accrue where they were taken from)

But when you treat the design by constraint framework as an interoperability specification to be taken up by projects and/or standards that are going to be implemented across as multitude of applications who just want to use the standards – then they’re just going to feel the pain of the externalised costs, and they’ll never really derive the benefits of the outcome. And the politics will eventually overwhelm you.

People go on and on about the structured vocab, and acts, and various ontological features of the RIM. But I reckon that 90+% of all pushback to v3 implementations that I’ve seen is related to the costs of design by constraint, and what it does to the XML (i.e. instance engineering problems). HL7 externalised those costs very effectively. And that’s an own goal. (And note that the discussion around UEL and PCAST vs CDA walks into things like XML, OIDS in XML, so forth – all things that arise from the way the CDA community does design by constraint).

openEHR walks around this by being explicit about the costs, and not being adopted as a standard. You expect to be writing specific software to make it work, or you only deal with the constrained models that come out the end. Note, again, that this is about externalised costs. Either you pay the cost – learn the reference model, invest in tools and software – or openEHR internalises the costs by handling all the transforms privately (only I think that this approach will be a problem in the long term – implementors are stuck on the wrong side of the power of the reference model).

But if a large project or a national standard turned around and picked openEHR instead of v3 – well, it’s not going to be any different. Design By Constraint will lead to chaos.

OMG, come save us?

I think that this problem really needs OMG.  As I pointed out earlier, this is really a case of design by contract – all we want is sophisticated contracts, and OMG has only provided really crude tools to do this.

Actually, what we are trying to do can be explained differently. UML defines Class diagrams and Instance diagrams. Class diagrams define a set of possible instance by defining their classes and the possible value domains they can have. Instance diagrams define a particular instance of data in terms of the class diagram. We want something in between – a diagram that describes a possible set of instances without taking ownership of their types. Easier said than done, though.

When I make the presentation of this content (the whole series of 4 blog posts) to OMG – and I have done so to some of the UML maintainers at an OMG meeting – their eyes glazed over, and they looked at me like I was an idiot. That’s crazy, they said: you’ll never get engineering continuity and coherence doing this.

Yeah, I agree. But semantic continuity and coherence – how you going to get that? (That’s where we started, what we’re trying to achieve) Well, I came away with the conclusion that engineers aren’t overly concerned about it. Sure, they get the notion of more sophisticated leverageable contracts in design by contract (and they had a good look at ADL too, with interest). But that’s only half my story.

I don’t know: how much is semantic consistency worth? That’s a subject I’ll take up in a later post.

In the mean time, I wish that the downstream price of Design By Contract was better understood by the people doing it, and by the large scale projects that adopt it. It’s not that it’s a wrong thing to do – but you have to know where the costs have gone.

 

13 Comments

  1. Grahame says:

    A lot of people have helped me crystalise this thinking over the last few years on this particular subject. I’d particularly like to recognise and thank: Charlie McCay, Thomas Beale, Ken Rubin, Gunther Schadow, Lloyd Mackenzie, John Koisch, Abdul-Malik Shakir, Dipak Kalra, Richard Soley, Andy Bond, Jobst Langrebe, Robert Lario, Ken Lunn, and Dennis Giorkas

  2. Great posts, and I hope lots of people chip in with comments. Here are some quickies:
    Agree with you that Design By Constraint (DBC) leads to transforms everywhere. But IMO anything else will lead to transforms everywhere, because the issues that DBC failed so solve won’t go away. No one model of healthcare data will ever rule the world. Different people will always want to be at different places in the specific-to-general spectrum. We should have learnt that by now!
    So transforms are always needed. My response is to have the tools to make transforms easy to develop and reliable; auto-generated from declarative mappings, lots of self-test by round trips, etc, so you can really trust the transforms.
    DBC does not work because what you get looks just as big and ugly as the general model you started from ; a one-size-fits-all approach to semantics, which no engineer likes when he/she looks at the results close-up.
    Green CDA ideas are popular because they allow to you start with the simple model you want to use, with no DBC superstructure, then transform it up to the general interoperable form. But for that to work, the transforms have got to be reliable, easy to make, easy to maintain.
    OMG aren’t going to rescue us; OMG is a forum for people to rescue themselves, if they know how.
    No mention of SAIF to the rescue? Still waiting to see any content in SAIF?

  3. Grahame says:

    Robert, yes. Have to have transforms everywhere. So maybe it’s no longer true that a single hub model is better than peer connections? Worth thinking about. I don’t know about SAIF. I don’t think it addresses stuff at this level.

  4. Thomas Beale says:

    You said: “The inevitable outcome of wide scale adoption of this technique is chaos. ” But that is not our experience in openEHR at all. There are now operational systems out there doing these very transforms, and working just as intended. It wasn’t easy to get the transforms right originally – its a difficult problem to solve. But once solved, it has huge value. I think chaos is only inevitable with a bad underlying architecture.

    In the openEHR world, we don’t just generate ‘use case specific models’, but also use-case specific APIs, i.e. partial classes, directly usable by developers to create data. This changes everything. Now a developer can see how to set a ‘PaO2’ and know that the smart generated software interface will do the right thing, created the right tagged data etc. So they just get on with building an app. In the future, fully generated MVC GUI components will be generated by similar means.

    So I don’t agree that ‘design by constraint will lead to chaos’. Done properly, it actually creates clarity. If a national programme picked up openEHR (as some have done), the positive consequences will become clear. However – ‘design by constraint’ is just one useful tool. A lot else is needed for a successful national e-health programme. Getting away from the gigantism of national ‘bus’ or ‘spine’ projects would be a start. They need a good semantic modelling environment and toolkit, but they need to create small things for industry to use initially. The ‘big bus’ will come into being over the ensuing 5 years.

    I don’t think that the OMG properly supporting DbC will help that much; that will just help with the creation of better quality reference models. Unless we want to try building archetypes in some new more powerful UML, and doing model-model transforms in UML-land between ‘general’ and ‘specific’ models. Again in theory possible, but I have not seen any tool vaguely capable of this. I have not yet seen a UML tool that can even properly generate an XSD from an object model (although maybe that is unfair, since XSD is such an abortion …).

    I think semantic consistency is crucial, without it you can’t safely query data. And with tera-bytes of health data piling up around the world, that just means wasted value. It means we can’t trust anything computers might do with the data. So we might as well go back to paper in that case. Or else design a framework that unites information, domain constraints and querying in a coherent way.

    • Andreas Schultz says:

      Thomas, did you ever had a look at Enterprise Architect from Sparx? This tool is also used by other Standardization Organiziations like the one at UN/CEFACT. Out of it you can produce XSD that at least are already a very good starting point.

  5. Peter Hendler says:

    This was a real eye opening series of posts. I’ll go back and read it over a number of times.

    If (this is for HL7 RIM only) one were to use the RIM ITS, then would you avoid these transforms? If you invested in your system knowing the whole RIM (I assume this is the complexity and your investment in the complexity is to learn the whole RIM) and if you used RIM ITS, then have you already dealt with the complexity, avoided the transforms and would you be have a working system?

  6. Grahame says:

    #Tom

    I think I made it clear that chaos is caused by mismatched expectations. Architecture may have something to do with expectations.

    > we don’t just generate ‘use case specific models’, but
    > also use-case specific API

    I don’t see how those things differ in any practical way. It’s just different syntax. Yes, they can feed data in, knowing it transforms ok. But that’s a small part of the picture. But if that’s all the users are doing, it’s not really design by constraint. You might have done that, but that’s opaque and optional from their point of view. You might as well just write transforms from on unrelated model to another by hand (from their pov). And so you lock them out of DbC – as I say, that walks around the chaos but you also forego the flexibility, and I think that will be become a problem later. I think you are just confirming what I wrote about why DbC doesn’t cause problems for openEHR anyway. The equivalent question in HL7 is the ones around how far to leverage greenCDA. It’s just that in HL7, people get to decide for themselves, instead of architectural leadership deciding. That’s got good and bad about it.

    I have seen OMG model-model tools. They’ve presented to HL7. They had an ok stack and tooling, but their thinking was entirely grammar free. I didn’t understand that at all. They hadn’t encountered transform use cases where fields mapped differently depending on their relative location in an instance? Nuts.

    #Peter

    The answer is probably both yes and no. The initial impetus for the RIM ITS was because of the logical hole it represented in these posts. If you deal with RIM, why bother with the XML ITS? But you can only say that you don’t need the XML ITS if you’re going to have trading partners that don’t need it (i.e. don’t need the specific models). Good luck with that.

  7. Lloyd McKenzie says:

    Actually BRIDG was designed as a CIM, but we’re starting to realize it shares more characteristics with a PIM (more on the side of logical design than business-expert readable requirements capture).

    I tend to agree with Robert’s line of thinking. If you move past custom point-to-point interfaces (and there’s lots of people who happily cling onto that world with all fingers and toes), then you’re going need transforms – even if it’s just to avoid having migrations occur as “big bangs” across hundreds of interacting systems. In the v2 world, interoperability is centred around the “interface engine” which is essentially just a tool for applying transforms. The idea that requirement would go out the window just because we’ve moved to XML is a bit over-optimistic.

    I don’t agree that transformations have to mean chaos. Non-standard transforms can mean chaos. Architectures and standards specifications that don’t set an expectation for the use of transforms can mean chaos. And that may be what we’ve managed to do so far. But I don’t see what it needs to be inevitable.

    One point you didn’t make is that v2 itself was constructed as “design by constraint”. It did provide an out of z-segments for local customization, but the rest of the specification was based around the idea of taking the standardized messages and segments and fields and tightening them down to more local requirements. Version 2.5 formalized this with the introduction of conformance profiles. The challenge with v2 is that the base specification was often poorly defined, making it unclear exactly what was being constrained. That combined with the complexity of the ?reference model? and an implementer attitude of “just shove it in and let the interface engine worry about it” made it a poor candidate to start with.

    What I’m taking away from this post is the following:
    1. There are costs to achieving semantic interoperability via RbC.
    2. At the moment, those costs are often experienced by stakeholders who often see little of the benefit provided by that semantic interoperability. (I.e. a physician office system vendor doesn’t really care that much about semantic interoperability. Big groups that want to aggregate and analyse and share lots of data like NCI, CDC, Infoway, NHS, etc. care about it in a big way.)
    3. Transforms can be a mechanism to help migrate the cost from those currently paying it to those benefiting from it, but if we don’t do it carefully, we’ll end up with even more of a mess.
    4. Therefore, we need to both encourage and carefully manage the efforts already underway in this transformation space.

  8. Grahame says:

    Lloyd: I think you missed #5: DbC delivers a clumsy outcome for semantic consistency in practice for the big vendors as well

  9. Victor Chai says:

    I do not fully agree with the statement that ‘design by constraint will lead to chaos’, rather it is the consequence of the bad architectural design. Just like if the implementer chooses java as the programming language to realize the benefits of OO, but in the end what he sees is a very chaotic fragile system due to improper design, can he put the blame on programming language he has chosen?

    We do need to keep in mind one of the OO principles – encapsulation. When it is applied to architecture design, it means hiding the complexity of the architecture and exposing the simple interface to external system. So having said that, we do not necessarily need to expose all these constrained model on the wire format, we only need to have unified canonical wire format . Whatever the constraints specified in the constrained model, shall be enforced by the application internally instead of exposed wire format, in fact that is precisely what CDA is doing and and main reason for its popularity. The wire format can enforce some level of constraints, but very very minimal in reality, and the downside of inheriting the constraints in the wire format is much much higher than its negligible benefits.

    If other inter-connected systems required wire format other the mandated canonical wire format, we can always apply transformation.

    Another point with regards to whether OMG can save us? you are right, be it HL7v3 or openEHR, they are all in fact following the typical UML design, the underlying reference model is the class model, whereas the various constrained model in v3 and archetype in openEHR are in fact object model. Object model is an instance of the class model for a particular use case. The application will be built using the class model, the constraints specified in the object model will only be enforced by the application itself, there is no formalism in UML to automatically translate the constraints in the underlying class diagram, and neither there is technology avail so far to
    translate these constraints to the platform specific language when performing code generation. So this in fact re-enforces the earlier point that we do not necessarily need to expose the constrained model detail on the wire format.

    So in summary, Dbc definitely helps to visualize the information constraints for the particular use cases, that will be useful and necessary for implementing business logic in the application which is built on the class model. But we need to limit the constrained model within certain layers of the architecture given the current technological limitation.

  10. Thomas Beale says:

    I have actually proposed to Richard Soley in the past the idea of the OMG adopting ADL. It was designed to be added on top of UML, and doesn’t assume anything else. One day 😉

  11. Grahame says:

    #Victor: My point is that the conflict is inherent in the approach. Architectural solutions may “solve” these things by hiding the outcomes – but that also hides the supposed benefits. So it’s a flaw in the inherent concept, as far as I am concerned. And ok, it doesn’t have to lead to chaos, but it will by default without being positively managed and curtailed.

    #Thomas: yes. Though being the OMG, they’d need a visual representation wouldn’t they? The UML engineer types were a bit cooler on the idea, but not resistent.

  12. Michael Lawley says:

    @Tom I think @Grahame’s right here – if all you’re consuming are the use-case-specific models/APIs then you’ve effectively hidden the design-by-constraint and foregone flexibility. However, you are possibly (probably?) in a much better position if at some later point you are looking to add flexibility into your system through some form of refactoring.

Leave a Reply

Your email address will not be published. Required fields are marked *

question razz sad evil exclaim smile redface biggrin surprised eek confused cool lol mad twisted rolleyes wink idea arrow neutral cry mrgreen

*

%d bloggers like this: