2009-10-09

Re: Why is UML so hard?

In response to Why is UML so hard?

In the late '90s I was working as a research associate at the University of York looking at CASE ( computer aided systems engineering ) tools and notations when UML started to happen to the industry as a merger of some OO notations developed in industry from experience in the '70s and '80s.

The academic world had already learnt from the cognitive science and HCI people that graphical notations don't help understanding above a certain level of detail, and that the strongest aids to understanding are spatial rather than graph based. Ignoring this, most UML notations use graph based and attach no meaning to spatial information ( you get swimlanes in some diagrams, but most diagrams the layout has no significance ).

In addition, as a species our brains have a million years of constructing graphs from narratives - it's how social groups evolve - and so creating graphs of relationships between objects from linear descriptions isn't that hard for us.

One thing which is hard for us to understand is concurrency, and having gone through a similar evolution finding that Petri nets rapidly become too complex to read as diagrams, most concurrency notations by the late 1990s used algebraic notations instead. You can't make money selling tools for algebraic notations for software - for many applications, source code is the most useful algebraic notation ( notations such as Z or pi-calculus are useful in special cases, but most systems the cost of producing a model in those notations isn't worth the benefit - you only do it if you want to analyse the system with those tools ).

The trend in academic CASE research at the time was away from single view models in CASE - these were considered unwieldy in practice - towards a 'viewpoints oriented' approach to modelling, where you had a web of different hyperlinked models of a system, each of which was tailored to a different purpose. It's obviously harder to make a tool that tolerates creative inconsistency than one based on a central authoritative model. UML sort of adopted this in it's distinction of Domain, Platform Independent and Implementation Models, but within each model there is often only single viewpoint. It's an open problem of how different viewpoints and the don't-repeat-yourself principle applies to software engineering. I've never had any success with PIMs, finding it more useful to create a profile that maps to the domain, and generating from that directly. But that requires hand-tooling of the code generation to the domain, which COTS UML tools obviously don't supply.

Sometimes a graph based notation is the right thing - module dependency diagrams are one CASE notation that can tell you something important about a system's complexity and design quality at a glance. A group did such an analysis at York for the Rolls Royce Trent FADECs in the '90s; some UML tools now available will do it for you as part of their reverse engineering, others force you to create the dependencies yourself. These UML package diagrams get used in presentations, so people will hide dependencies which they make the diagram look cluttered and hard to read - but the clutter is most useful thing the diagram is showing.

Quite a lot of the problems I've seen with applying UML seem to come from forgetting that it's a model. I take the Pragmatic view - a good model is one where you get useful results from applying that model; attempting to capture the 'truth' of a system isn't useful in itself.

Duffymo says 'We'd have bike shed arguments for hours about what "customer" meant.'. If you're creating a customer class, you should already have

  • a customer requirement which has been described using

  • a user story or a persona taking part in a use case

  • which involves analysis use cases which have been decomposed into areas of responsibility

  • which are then allotted to packages

  • which contain classes, each which has a single responsibility


Sometimes this works well, but it is brittle - there's always a temptation to jump to design because you know what a customer is - and requires management. It's a bit easier to do if you have a concrete system in mind, and a hard boundary on the responsibilities.

But the biggest step is stopping arguing about what "customer" means, but arguing about how an object which has the semantics represented by the "customer" class in the model satisfies the responsibilities of the "accounts payable ( account greater than one month arrears )" use case ( or takes part in that user story, or fills out its class responsibility cards, or whatever driver for the model you're using to inject the requirements ). Software has to do something, so what matters is what "customer" does in the context of the software it's part of.

Maybe the most important thing about UML is that deep down, it doesn't mean anything - there are no formal semantics; it's a notation with syntax only and it's up to users to apply semantics from their domain. Added to that that almost all the actual designing being done is the work between the diagrams, which is lost in UML ( the better tools let you create associations showing allotments of functionality to packages from the use cases, but there's no notation to record why and how such allotments were made ), and you get a tool that's not easy to use for its stated purpose - recording the design of a system.

So the goal when applying UML to a problem is for the UML model to be useful, rather than attempting to capture meaning. Arguments as to how to represent things in the model applying the semantics and notations of UML should be grounded in the effect such representations have on the utility of the model.

The first question you should ask when applying a UML tool to a new system design, is to create a sketch of how the inputs and outputs of the process of creating a model of the system will interact with the design process as a whole.

A rule of thumb is to only record the information which has utility in the next phase of the software engineering process. Anything else will probably change as the system evolves, so is just a rough note, and I've never seen a UML tool which was as easily to use for rough notes as a pencil and paper.

In terms of using the parts of UML which represent the system, you can try minimizing cost by reducing the level of detail and reverse engineering. You can maximise benefit by generating code or performing analyses before committing to designs. If you're not using a tool that allows you to either generate high quality code or run speculative analyses, it's unlikely to be worth creating a detailed model. If all you want from your model is documentation, think about using a tool such as Doxygen instead - any documentation of the system-as-built which isn't generated from the code is unlikely to be accurate.

Another approach is to use UML to record design commitments - which usually means designing down to package or façade level, rather than to implementation classes. I've used that for small projects ( 6 person monts ), and it seemed to be about the right level for that. Larger systems should be broken into modules, and if you need more detail than the façade of an external module you're using then there's probably something wrong.

The time I've had most success with UML it has been used more as a configuration tool than as a design tool - we had an existing system which was to be customised, we agreed a simple profile of UML that had a mapping to the problem domain, and a code generation/reverse engineering tool specific to the system.

Labels: ,

2009-10-05

Random Lists

As kin has support for pattern matching and sequences at its heart, it's natural to formulate a random number generator as a generator of a sequence of random numbers.

For example, this is the same linear congruent shift register as used in java.util.Random:


type lcsr ( seed : uint64 ) : seq [ uint32 ]
def head : uint32 => uint32 ( self.seed >> 16 )
def tail : lcsr => lcsr ( ( self.seed * 0x0005_deec_e66d + 0x0000_0000_000b ) & 0xffff_ffff_ffff )


You can wrap it in an actor if you want the same behaviour as a random function with hidden state:

type @actor random_source ( sequence : seq[uint32] )
def next_uint32 ( self ) : uint32
let r = self.sequence.head
self become ( sequence = self.sequence.tail )
=> r

def next_real64 ( self ) : real64
=> real64 ( natural ( self.next_uint32() ) ) / real64 ( natural ( uint32.max ) )

But you couldn't optimise matching on the value reported by a random function like you can with a pure sequence:

let rnd_seq = lcsr ( 0x12345 )

match rnd_seq
0x00000001 :: 0xadfc5fc8 :: 0x22222222 :: 0x33333333 :: _ =>
out <- "match one" <- endl
0x00000001 :: 0xadfc5fc8 :: 0x9d1a5de6 :: 0xf6b685cf :: _ =>
out <- "match two" <- endl
_ =>
out <- "no match" <- endl

The cons'd patterns are compiled to combinations of calls to seq.head or seq.tail. The first two matches get combined as the first couple of terms in the pattern are the same; if seq.head or seq.tail were not pure functions, the second match case would fail as the hidden state would change.


Tim Bray describes using this technique is "a sideways dance" in Haskell. I've always found Haskell a bit too 'stiff' and like to 'feel the bits between my toes' too much for it, but compared to the amount of messing around Java has to do with atomic longs (JDK7) or synchronized methods (GNU classpath) in the implementation of the random source, and then as an Iterable<Integer> if you want a sequence of random numbers, it doesn't seem too bad. In the Java case, either the runtime has to deduce that the interlocks are not required and remove them, or it has to process them; if the code is used in more than one place you may have to have more than one machine code compiled form of the method, or fall back to conservative locking always. For a pure sequence, no interlocks are required. Wrapping a sequence in a actor ( or a lazy list in a monad in Haskell ) gives you hidden variable behaviour if you want it without polluting the code of the sequence generator with synchronisation primitives.

Labels: ,