2009-04-16

Comma Quibbling

Link: http://blogs.msdn.com/ericlippert/archive/2009/04/15/comma-quibbling.aspx

An example in kin. This misses the difficulty of the C# problem - as sequences in kin are linear, rather than being the mutating iterator/enumerator pattern used in Java/C#, you can pattern match several elements at once trivially (being able to pattern match sequences and backtrack on failure is one of the main reasons for writing kin in the first place).

module examples::eric_lippert_20090415
# http://blogs.msdn.com/ericlippert/archive/2009/04/15/comma-quibbling.aspx
def comma_append ( out, s )
match s
h :: [] =>
out <- h
h :: m :: [] =>
out <- h <- " and " <- m
h :: t =>
out <- h <- ", "
comma_append ( out, t )
[] =>
pass

def write_with_commas ( out, s )
out <- '{'
comma_append ( out,s )
out <- '}'
=> out

def to_comma_string ( s ) => write_with_commas( string_writer(), s ).str()

def main ( in, out, args )
write_with_commas ( out, args )
out <- '\n'
out <- to_comma_string( args ) <- '\n'

Labels: , ,

2009-04-09

A new way to look at networking.

Link: http://video.google.com/videoplay?docid=-6972678839686672840

I find this video incredibly interesting. It's talking about the phase transition that occurred between telecoms networks and packet switched networks, and questioning whether the same can happen between packets and data.

In a routed telecom network, your number 1234 meant move first switch 1, second switch 2 and so on - it described a path between two endpoints. In TCP your packet has an address, and each node in the mesh routes in to the node it thinks is closer to the endpoint with the target address.

Currently I have two or three home machines (netbook, desktop and file server) where data lives locally, plus data 'in the cloud' on gmail.com and assembla.com, or on my site here at tincancamera.com . Though as far as I'm concerned, that data isn't in the cloud - it's on whatever server at those addresses resolve to. If instead I google for 'pete kirkham', about half the hits are me. If I put a UUID on a public site, then once indexed I could find that document by that ID by putting it into a web server.

In the semantic web, there's been a bit of evolution from triples without provenance to quads - triples and either an id or some other means of associating that id with the URI of the origin which asserts it. It's considered good form to have the URI for an object resolvable to some representation of that object, so there's a mix of identifier and address. The URI for kin is http://purl.oclc.org/net/kin, which uses HTTP redirection to point it to the current kin site. This depends on OCLC's generous provision of the persistent uniform resource locator service, and maps an address to another address rather than quite mapping an identifier to an address.

There isn't an obvious analogous mechanism for data to the TCP nodes routing the request to a node which might be closer to having the data, though is some ways a caching scheme with strong names might meet many of the requirements. It sort of brings to mind the pub-sub systems I've built in the past, but with a stronger form of authentication. Pub-sub replication isn't caching, as in a distributed system there isn't a master copy which the cached version is a copy of.

There's also a bit of discussion between broadcast and end-to-end messages; I've got to work out how to do zero-config messaging sometime, which again sort of makes sense in such systems - first reach your neighbours, then ask them for the either the data, or the address of a node which might have the data. But there still isn't an obvious mapping of the data space without the sort of hierarchy that TCP addresses have. (although pub-sub topics are often hierarchic, that hierarchy doesn't reflect the network topology even to the limited extent that TCP addresses do.) It also has some relation to p2p networks such as bit-torrent, and by the time you're talking about accessing my mail account in the cloud, to webs of trust. A paper on unique object references in support of large messages in distributed systems just popped up on LtU, which I'll add to my reading list this weekend.

I've also started reading Enterprise Integration Patterns. There's a bit of dissonance between the cocept of channel in these patterns, and I guess in the systems they are implemented in, and the concept of channel in pi-calculus. In EIP, a channel is a buffered pipe for messages, and is a static entity created as part of the system configuration, in pi-calculus a channel is not buffered, and the normal mechanism for a request/response pair is to create a new channel, sent that channel to the server of the request, and receive the response on the channel. The pi-calculus model is the pattern for communication I'm intending for inter-process communication in kin (though I'm not quite sure whether giving explicit support to other parts of pi-calculus is something I want). I'm not sure that having to mark request messages with an ID (as in XMPP IRQ) rather than having cheap channels is a good programming model; though there are reasons that pretending that an unreliable connection isn't a pi-calculus channel is a good idea. I'll see what I think after I've tried coding the channels. Certainly returning a channel from a query given a strong name which lets the requester communicate with an endpoint which has the named data could be a model worth pursuing. REST is a bit different in that the response channel is assumed by the protocol, but the URLs returned by the application are channels for further messages to be sent to, and of course can be addresses anywhere allowing the server for the next request to be chosen to be the one best suited to handle that request (eg, a list of the devices in a home SCADA system could list the URIs of the embedded servers on the sensor net rather than requiring the client to know the topology prior to making the initial request).

Labels: , , ,

2009-04-01

On UML

Over the last few weeks I've been working on kin, and hanging around stackoverflow. For one question on usefulness of UML, I ended writing a rather long answer, so as I haven't blogged in a bit I thought I'd post it here too.


There's a difference between modelling and models.

Initially in the design process, the value in producing a model is that you have to to get to a concrete enough representation of the system that it can be written down. The actual models can be, and probably should be, temporary artefacts such as whiteboards, paper sketches or post-it notes and string.

In businesses where there is a requirement to record the design process for auditing, these artefacts need to be captured. You can encode these sketchy models in a UML tool, but you rarely get a great deal of value from it over just scanning the sketches. Here we see UML tools used as fussy documentation repositories. They don't have much added value for that use.

I've also seen UML tools used to convert freehand sketches to graphics for presentations. This is rarely a good idea, for two reasons -

1. most model-based UML tools don't produce high quality diagrams. They often don't anti-alias correctly, and have apalling 'autorouting' implementations.
2. understandable presentations don't have complicated diagrams; they show an abstraction. The abstraction mechanism in UML is packages, but every UML tool also has an option to hide the internals of classes. Getting into the habit of presenting UML models with the details missing hides complexity, rather than managing it. It means that a simple diagram of four classes with 400 members get through code review, but one based on a better division of responsibilities will look more complicated.

During the elaboration of large systems (more than a handful of developers), it's common to break the system into sub-systems, and map these sub-systems to packages (functionally) and components (structurally). These mappings are again fairly broad-brush, but they are more formal than the initial sketch. You can put them into a tool, and then you will have a structure in the tool which you can later populate. A good tool will also warn you of circular dependencies, and (if you have recorded mappings from use cases to requirements to the packages to which the requirements are assigned) then you also have useful dependency graphs and can generate Gantt charts as to what you need for a feature and when you can expect that feature to ship. (AFAIK state-of-the art is dependency modelling and adding time attributes, but I haven't seen anything which goes as far as Gantt.)

So if you are in a project which has to record requirements capture and assignment, you can do that in a UML tool, and you may get some extra benefit on top in terms of being able to check the dependencies and extract information plan work breakdown schedules.

Most of that doesn't help in small, agile shops which don't care about CMMI or ISO-9001 compliance.

(There are also some COTS tools which provide executable UML and BPML models. These claim to provide a rapid means to de-risk a design. I haven't used them myself so won't go into details.)

At the design stage, you can model software down to modelling classes, method and the procedural aspects of methods with sequence diagrams, state models and action languages. I've tended not to, and prefer to think in code rather than in the model at that stage. That's partly because the code generators in the tools I've used have either been poor, or too inflexible for creating high quality implementations.

OTOH I have written simulation frameworks which take SysML models of components and systems and simulate their behavior based on such techniques. In that case there is a gain, as such a model of a system doesn't assume an execution model, whereas the code generation tools assume a fixed execution model.

For a model to be useful, I've found it important to be able to decouple the domain model from execution semantics. You can't represent the relation f = m * a in action semantics. You can only represent the evaluation followed by the assignment f := m * a, so to get a general-purpose model that has three bidirectional ports f, m and a you'd have to write three actions, f := m * a, m := f / a, a := f / m. So in a model where a single constraint of a 7-ary relation will suffice, if your tool requires you to express it in action semantics you have to rewrite the relation 7 times. I haven't seen a COTS UML tool which can process constraint network models well enough to give a sevenfold gain over coding it yourself, but that sort of reuse can be made with a bespoke engine processing a standard UML model. If you have a rapidly changing domain model and then build your own interpreter/compiler against the meta-model for that domain, then you can have a big win. I believe some BPML tools work in a similar way to this, but haven't used them, as that isn't a domain I've worked.

Where the model is decoupled from the execution language, this process is called model driven development, and Matlab is the most common example; if you're generating software from a model which matches the execution semantics of the target language it's called model driven architecture. In MDA you have both a domain and an implementation model, in MDD you have a domain model and a specialised transformation to map the domain to multiple executable implementations. I'm a MDD fan, and MDA seems to have little gain - you're restricting yourself to whatever subset of the implementation language your tool supports and your model can represent, you can't tune to your environment, and graphical models are often much harder to understand than linear ones - we've a million years evolution constructing complex relationships between individuals from linear narratives, (who was Pooh's youngest friend's mother?) whereas constructing an execution flow from several disjoint graphs is something we've only had to do in the last century or so.

I've also created domain specific profiles of UML, and used it as a component description language. It's very good for that, and by processing the model you can create custom configuration and installation scripts for a complicated system. That's most useful where you have a system or systems comprising of stock components with some parametrisation.

When working in environments which require UML documentation of the implementation of a software product, I tend to reverse engineer it rather than the other way.

When there's some compression of information to be had by using a machine-processable detailed model, and the cost of setting that up code-generation of sufficient quality is amortized across multiple uses of the model or by reuse across multiple models, then I use UML modelling tools. If I can spend a week setting up a tool which stamps out parameterised components like a cookie-cutter in a day, and it takes 3 days to do it by hand, and I have ten such components in my systems, then I'll spend that week tooling up.

Apply the rules 'Once and Once Only' and 'You Aren't Gonna Need It' to the tools as much as to the rest of your programming.

So the short answer is yes, I've found modelling useful, but models are less so. Unless you're creating families of similar systems, you don't gain sufficient benefit from detailed models to amortize the cost of creating them

Labels: