2009-10-09

Re: Why is UML so hard?

In response to Why is UML so hard?

In the late '90s I was working as a research associate at the University of York looking at CASE ( computer aided systems engineering ) tools and notations when UML started to happen to the industry as a merger of some OO notations developed in industry from experience in the '70s and '80s.

The academic world had already learnt from the cognitive science and HCI people that graphical notations don't help understanding above a certain level of detail, and that the strongest aids to understanding are spatial rather than graph based. Ignoring this, most UML notations use graph based and attach no meaning to spatial information ( you get swimlanes in some diagrams, but most diagrams the layout has no significance ).

In addition, as a species our brains have a million years of constructing graphs from narratives - it's how social groups evolve - and so creating graphs of relationships between objects from linear descriptions isn't that hard for us.

One thing which is hard for us to understand is concurrency, and having gone through a similar evolution finding that Petri nets rapidly become too complex to read as diagrams, most concurrency notations by the late 1990s used algebraic notations instead. You can't make money selling tools for algebraic notations for software - for many applications, source code is the most useful algebraic notation ( notations such as Z or pi-calculus are useful in special cases, but most systems the cost of producing a model in those notations isn't worth the benefit - you only do it if you want to analyse the system with those tools ).

The trend in academic CASE research at the time was away from single view models in CASE - these were considered unwieldy in practice - towards a 'viewpoints oriented' approach to modelling, where you had a web of different hyperlinked models of a system, each of which was tailored to a different purpose. It's obviously harder to make a tool that tolerates creative inconsistency than one based on a central authoritative model. UML sort of adopted this in it's distinction of Domain, Platform Independent and Implementation Models, but within each model there is often only single viewpoint. It's an open problem of how different viewpoints and the don't-repeat-yourself principle applies to software engineering. I've never had any success with PIMs, finding it more useful to create a profile that maps to the domain, and generating from that directly. But that requires hand-tooling of the code generation to the domain, which COTS UML tools obviously don't supply.

Sometimes a graph based notation is the right thing - module dependency diagrams are one CASE notation that can tell you something important about a system's complexity and design quality at a glance. A group did such an analysis at York for the Rolls Royce Trent FADECs in the '90s; some UML tools now available will do it for you as part of their reverse engineering, others force you to create the dependencies yourself. These UML package diagrams get used in presentations, so people will hide dependencies which they make the diagram look cluttered and hard to read - but the clutter is most useful thing the diagram is showing.

Quite a lot of the problems I've seen with applying UML seem to come from forgetting that it's a model. I take the Pragmatic view - a good model is one where you get useful results from applying that model; attempting to capture the 'truth' of a system isn't useful in itself.

Duffymo says 'We'd have bike shed arguments for hours about what "customer" meant.'. If you're creating a customer class, you should already have

  • a customer requirement which has been described using

  • a user story or a persona taking part in a use case

  • which involves analysis use cases which have been decomposed into areas of responsibility

  • which are then allotted to packages

  • which contain classes, each which has a single responsibility


Sometimes this works well, but it is brittle - there's always a temptation to jump to design because you know what a customer is - and requires management. It's a bit easier to do if you have a concrete system in mind, and a hard boundary on the responsibilities.

But the biggest step is stopping arguing about what "customer" means, but arguing about how an object which has the semantics represented by the "customer" class in the model satisfies the responsibilities of the "accounts payable ( account greater than one month arrears )" use case ( or takes part in that user story, or fills out its class responsibility cards, or whatever driver for the model you're using to inject the requirements ). Software has to do something, so what matters is what "customer" does in the context of the software it's part of.

Maybe the most important thing about UML is that deep down, it doesn't mean anything - there are no formal semantics; it's a notation with syntax only and it's up to users to apply semantics from their domain. Added to that that almost all the actual designing being done is the work between the diagrams, which is lost in UML ( the better tools let you create associations showing allotments of functionality to packages from the use cases, but there's no notation to record why and how such allotments were made ), and you get a tool that's not easy to use for its stated purpose - recording the design of a system.

So the goal when applying UML to a problem is for the UML model to be useful, rather than attempting to capture meaning. Arguments as to how to represent things in the model applying the semantics and notations of UML should be grounded in the effect such representations have on the utility of the model.

The first question you should ask when applying a UML tool to a new system design, is to create a sketch of how the inputs and outputs of the process of creating a model of the system will interact with the design process as a whole.

A rule of thumb is to only record the information which has utility in the next phase of the software engineering process. Anything else will probably change as the system evolves, so is just a rough note, and I've never seen a UML tool which was as easily to use for rough notes as a pencil and paper.

In terms of using the parts of UML which represent the system, you can try minimizing cost by reducing the level of detail and reverse engineering. You can maximise benefit by generating code or performing analyses before committing to designs. If you're not using a tool that allows you to either generate high quality code or run speculative analyses, it's unlikely to be worth creating a detailed model. If all you want from your model is documentation, think about using a tool such as Doxygen instead - any documentation of the system-as-built which isn't generated from the code is unlikely to be accurate.

Another approach is to use UML to record design commitments - which usually means designing down to package or façade level, rather than to implementation classes. I've used that for small projects ( 6 person monts ), and it seemed to be about the right level for that. Larger systems should be broken into modules, and if you need more detail than the façade of an external module you're using then there's probably something wrong.

The time I've had most success with UML it has been used more as a configuration tool than as a design tool - we had an existing system which was to be customised, we agreed a simple profile of UML that had a mapping to the problem domain, and a code generation/reverse engineering tool specific to the system.

Labels: ,

3 Comments:

Blogger Michael Duffy said...

Pete, brilliant stuff. You're right - it's more than a comment. Your usual insight and depth is greatly appreciated.

Your point about "it's not UML's fault" is true. The problems that I cite would have happened regardless of which tool or notation we used to attack the problem.

Friday, 9 October 2009 at 23:11:00 BST  
Blogger Unknown said...

Thanks a lot for these explanations! I'd be very interested in references to the studies on spatial cognition that you mentioned!

I'd just like to add that in my experience its part of the problem that people would like a graphical representation that accurately describes the software. That is, a visualization that is not just an explanatory aid but that allows comparison of systems. For comparison, the mapping from the visualization to the actual software must be well defined and that's probably where it gets hard.

Tuesday, 13 October 2009 at 23:22:00 BST  
Blogger Pete Kirkham said...

One starting point with a good feel for research about the time UML was starting is the 1995 Hypertext special edition of CACM has a range of articles, several of which relate to use of spatial relationships to represent information; also "Why looking isn't always seeing: readership skills and graphical programming" from the same year discusses notations specifically. At HISE we also did some work supporting a variant of Toulmin style arguments, where the different components of the rhetoric had a fixed relation to each other - Tim Kelly's PhD covers its use in the tools we produced in detail.

Friday, 16 October 2009 at 02:09:00 BST  

Post a Comment

<< Home