2008-01-28

Plays well with others?

> I know large systems can be built statically...I've also done it dynamically in more than one dynamic language.
That hasn't quite been my experience - every large system I've worked on has ended up getting layered into a static core with dynamic or DSL scripting. Greenspun's tenth rule.

On the other hand, even medium applications I've been involved in which were implemented only in a dynamic language got into difficulty with performance or modularity, and so either evolved into a mixed language system or were reimplemented.

Often I also use dynamic languages or environments to prototype algorithms, and sometimes then have to port to static languages for performance reasons - writing better code generators is rarely cost effective, as the dynamic languages are sufficiently good at the general cases, if you need to optimise you're optimising for a specific requirement. The same happens (albeit more rarely) where you need to add inline assembler code to C or Fortran where you need to go below their abstraction level (CAS or bitwise rotation, integer overflow etc).

Also in large systems you often interface with extant libraries, which may be implemented in diverse languages. You're not going to port half a million lines of a numerical library into Smalltalk so you can call it.

So you end up with interfaces between the user code and the library, and having to specify in a machine processable manner the types at that interface, as you're now below the abstraction barrier of the dynamic language.

Other dynamic languages require definitions for such interfaces, though foreign ones may not be visible in the language's introspection facilities - for example, Mozilla's JavaScript uses IDL for the definition, but the type and arity of functions isn't visible via introspection.

This introduces cost and brittleness to the system; and in the types of systems I've most experience with it also makes it harder to move a prototype algorithm between the dynamic and library codebases than it feels it ought to be - the dynamic runtime usually has the facilities for doing most of the code generation work already, but there's no means of accessing the results and tweaking it. I am looking for mechanisms which make that sort of task easier, and which don't require custom interpreters for DSLs, which are a pain to get right or maintain.

>"What happens in Smalltalk projects where you grow larger? Where you want to divide a system up into components, and some of those components don't exist yet?"
> You have a need; you have an idea to meet the need; you try it; you grow it. I guess I don't understand the question.

In my experience, that only happens up to a certain size. My question was about large systems - systems where you have to subcontract components, either to another individual in your team, to a second local team or to a separate organisation. Bottom up approaches don't scale out to that level, top down approaches introduce brittleness and (code production) inefficiency. The Smalltalk browser appears to be strongly a bottom up tool; I'm curious as to whether it allows that approach to scale out better.

> Semi-formal notations, tests and other examples, documentation, good tools -- these all contribute to success. But if you are looking for some kind of a "static declaration" as they saving mechanism for large systems -- it may work for you, but I doubt you can prove they are necessary and sufficient. I have counter-examples.

I do consider some form of specification of the domain and codomain of a function to be required, and the stronger the specification which can be automatically tested, the better. Verifying natural language specifications by hand is simply not viable for large software projects. What's provided by the type systems of most languages is not sufficient even for that - you can't say the domain of a square root function is the set of positive reals. Specifying pre- and post- conditions helps, but for anything involving concurrency you need 'for all' rather than the 'there exists' specifications given by unit tests. But most of the system doesn't require formal specification, and good design minimises shared state.

So I'm looking for something with the gains of dynamic languages without barriers to performance, and good modularisation. As such, I'm more interested in gradual typing than Smalltalk. But if it does have anything to offer then I'll borrow from it.

> "Can such interfaces be browsed in the same way if the code can't be called?"
> Your question betrays your bias. "There must be some such mechanism as the one I am used to," he thinks.
Yes, I have a bias based on my experience and problem domains.

One project I was involved in was refreshing the library used for airworthiness stress analysis, with a view to improving the performance (it was costing around half a million pounds in CPU time to run the analysis for each design revision, so in the company at that time, making a 10% improvement in the run time would mean someone didn't have to get made redundant). This consisted of a set of Fortran libraries, some OS scripts, and a domain specific scripting language which encoded the models and analysis algorithms. A set of flight states was mapped over the model, and the outputs reduced with a filter to indicate any resonances. The core of the system was mostly unchanged since the late 1970s, only the models used being updated with successive revisions to the aircraft.

There was a large amount of model and library code, so I wrote a quick and dirty call browser to explore data flows in the parts of the system I had access to. Since the code was significantly larger than the memory capacity of the computer I was running the analyses on, let alone expanding that to the AST and call graph, the interactive image and execution based querying as presented in the Smalltalk browser post would not have been suitable. The project ended up going nowhere, as translating the implementation of non-standard dynamic language from the mainframe to PC would take more resources than the team had in the time available.

If the scripting was in a standardised dynamic language, then the maintenance cost would have been much lower, and the chances are the code could have been moved from the costly mainframe to instead run overnight on PCs.

So is there any way such tools can extend for a system of a thousand executables, rather than one? A system which doesn't have everything visible? The types are somewhat irrelevant to this, but most programs people are developing are not whole; the question is of scaling approaches which assume you have a single image which is everything you care about, and all changes to the system are monotonic. If you have a system with multiple teams working on it, if you are developing a library which may be used in a number of ways, or if you have a system which is so big you can't load it without paging, then I'm interested in what happens, since anything you deduce from calls to code you can't inspect is not monotonic - some other use of the function may be a correct use and widen its domain. Partly it's an issue of extensional rather than intentional analysis - some systems you need to prove that the core is correct rather than show it is not incorrect in parts. But an awful lot of the code I work with doesn't need that level of confidence, and as much as possible I write that in dynamic languages. Sometimes I make that judgement wrongly, and so I'm looking for mechanisms which allow 'fixing' of parts of the system more easily, as well as allowing assertions to be made for compatibility between parts of a heterogeneous system.

> Most large systems developed with dynamic languages have not used the kind of mechanism you seek.
I've seen such mechanisms in JavaScript, Lisp, Erlang and Python. Maybe Smalltalk just doesn't play well with others, and is only suitable to monolithic applications.

Labels: ,