You can't buy icecream from Amazon.



XTech 2006 - Thursday

Link: http://xtech06.usefulinc.com/schedule#d2006-05-18

Same story. Trying to sleep off cold. Thursday was mostly in the browser technology stream, starting with the second seminar.

Daniel Glazman really believes in Etna (talk | project home). It's always good to hear someone who's both passionate and technically competent talk.

Etna is an open source XML editor, wysiwyg (using CSS). It is being built to allow non-techies to write XML reports, so its non-functional requirements include being able to be used by people with no wish to learn XML.

It uses its own RelaxNG engine, with extensions to provide initial content, and mappings for key events (such as ending a list when the user presses [↵] twice).

Further RNG extensions for external doctypes, processing instructions, label and description of content, localisation properties, and behavioural semantics. By behavioural semantics, Etna allows you to specify tabular, list, outline or block style editing controls - although the CSS could make any XML look like any of these, the idioms the user uses to navigate and edit such content varies, and Etna supports these idioms.

All operations in Etna yield valid XML - gone are such low level operations such as renaming a div tag to a paragraph tag - instead it uses pattern matching to transform between content models, preserving the text content and a maximum amount of structure. This approach was somewhat contentious with some of the audience, since it isn't obvious that it will allow all the operations an old sgml hack would want to do in emacs, but it is the right thing to do in a wysiwyg editor, and I think it's the right thing for a DSL editor too. (* TME: Though for a DSL, I would want the editor to use a different presentation HTML or XUL tree than the XML representation of the AST, and to be able to modify the type labels of the AST based on the text content, which is beyond where Etna is at the moment.)

Mark Birbeck gave a talk about building rich, encapsulated widgets using XBL, XForms and SVG. As before, no mention of breaking the SVG presentation apart into layers, nor how to cope with removing content from the document tree and losing the bindings, but still interesting to see other things you can do with what's available.

Thomas Meinike gave a talk about SVG under Firefox 1.5; nothing new, but he's collected some reference material.

Ryan King gave a talk on the design of microformats (I'll ignore his USA parochial title). Basically - don't repeat yourself + pave the cow-paths. Reuse existing HTML and annotate with semantics, allowing convergence and openness and a very low entry point.

When designing a new microformat, he asked these questions:
  • Is there an HTML element that will work?

  • Is there an HTML compound that will work?

  • Is there an existing microformat that will mostly work?

  • Are there any existing standards which are both established and interoperable?

If the answer's yes to any of them, then you don't create a new microformat.

When creating microformats, take a large sample of the existing documents on the web that contain information of the type you are trying to add semantic annotations to. Design your format so that it's in keeping with the consensus of those documents.

After that, I took myself away from the browser crowd and went to hear Benoît Marchal on UML modelling for XML. His main points were

  • Integrate XML schema into the rest of the design using the same UML toolset

  • Non-UML tools favour hierarchic schemata; UML allows more horizontal relations so can get a better data model

  • Use the UML tool as a database with a graph-based front end, not just a drawing tool.

  • Use XSLT on XMI to generate the schema. Use an XSLT pipeline, with an intermediate form to simplify the XMI and standardise it across tools.

(* TME: Although I think most current UML tools are poor from a usability stand point, and the lack of denotational semantics in UML2 and laxity in the XMI2 standard restrict model portability, I sympathise with most of what Benoît was saying. I also tend to go further and use XSLT for code generation instead of any vendor specific MDA implementation, again for portability and independence. Some of what he said about schema generation is specific to W3C XSD, and since I work on largely greenfield development I live in a RNG world, aren't as pressing *)

Next up on the browser track was Dean Jackson of the W3C, who is working with the industry to update web standards, such as XMLHttpRequest (why one acronym is all caps and the other is camel cased, I'll never know).

Then came XForms: an alternative to Ajax? You can do some things declaratively using XForms rather than procedurally using JavaScript. The same synchrony and complex rendering issues apply as to XBL.

That was the end of the day at 17:30, and after staying a while at drinks, I went to my room for a nap. Got thinking about yesterday's schema validation talk and today's Etna and other browser stuff, ordered pizza and coffee from room service and started hacking this, getting to bed around 5am.




Comprehending Engineers

Link: http://www.mcgeesmusings.net/2006/05/25/comprehending-engineers/

Part 1: To the optimist, the glass is half full. To the pessimist, the glass is half empty. To the engineer, the glass is twice as big as it needs to be.

This had me laughing out loud. My Dad's a retired structural engineering lecturer and consultant; one of my uncles was chief civil engineer on the new Seven crossing, along with several other well respected bridges. I'm a systems engineering toolsmith - my degrees were from electronics and mechanical schools, but I spend all my time making computer tools. So I've had the 'targets' one several times from my Dad. All too true.




XTech 2006 - Wednesday

Wednesday morning I had a cold and was late getting up, so missed Paul Graham's keynote on startups.

SQL, XQuery, and SPARQL: What's Wrong With This Picture? covered the lack of difference between the power of SPARQL vs recursive SQL. This means either that you can use existing optimised SQL databases to store and process RDF triples, converting SPARQL to SQL3 as required, or use RDF at the perimeter and SQL internally.

(* TME: Since SPARQL update seems to be a long time coming, and the restrictions imposed by triples make even simple context problems more complex, I'm not sure if there is ever going to be any merit in RDF. I overhead someone from Sesame saying they are moving to quads, because of the many use cases that triples don't meet (I think the tipping point would be hepts, but that's more anthropomorphic than graph-theoretical). Similarly others have suggested named graphs, eg TriX, as a solution to the context problem in RDF. Currently I haven't seen anything that would indicate a rest based, plain ol' xml interface onto an sql database wouldn't be better for anything with context, uncertainty or transitive relations. The better RDF applications all seem only to deal with information that is equally trusted and true. *)

Michael Kay spoke on Using XSLT and XQuery for life-size applications. Speaking of using fixed schemas to validate human data - for example the format of an address field which forces you to enter a house number forcing the operator to massage the data to fit - 'Integrity rules mean you only accept data if it's false.'

He tends to observe the documents which are collected in the business process, rather than trying to get the experts to create an abstract model. For example, an employee's record is a collection of completed forms. So if you want the current employee state, query the history and make a summary taking the most recent values.

Applications are built by composing pipelines which transform these documents. Using automatically generated schemata to describe the intermediate stages in these pipelines gives regression tests.

(* I like Michael's anthropological approach, and imagine that it would build applications that augment the current process, rather than trying to impose an idealised process - which is what some managers attempt to do. *)

Next up, Jonas Sicking of Mozilla talked about XBL2, a revision of Mozilla's XML binding language.

Some of the improvements are tidier means of binding XML (the original XBL is somewhat polluted from being an RDF binding language), support for better separation of implementation and presentation, richer binding mechanisms and plugin security improvements.

(* All of these seem good, but doesn't address the three show-stopping problems I've had with XBL which have meant I've had to stop using it everywhere I'd like to:

  • The binding of methods and properties is asynchronous, so you can't easily create an element with an XBL binding, then call Javascript methods on it.

  • The method and property bindings only exist when the element is in the document tree. This complicates Undo and Redo, as removing or re-adding an element changes its interface.

  • Trying to use XBL with SVG, sometimes the same data is presented in multiple layers, so there isn't a single point to insert content.

Everywhere I've hit the third point, I've had to move from XBL; most times I've hit the second, XBL got too painful and I've moved away. I'm still thinking about moving away from XBL for some of the first case, but probably will have by the time I'm selling software based on XUL as having intemittant glitches caused by races is not a good user experience. *)

Ralph Meijer (who was also at BarCamp) allegedly spoke on Publish-subscribe using Jabber. Or maybe he spoke on publish-subscribe using XMPP, since there are more applications than Jabber. This was interesting in as much as seeing how much take up there was so far - not much, though some other people have noticed that it has potential both for distributed computing, for human interaction, for location based systems, and for hybrid networks. It's been mooted that in the internet of things sensor nets will dominate over comms nets; but already we communicate by publishing our availability and playlists - your iPod is another sensor measuring your mood; a keyboard/mouse activity monitor on your pc is sensing your presence; pushing smarter sensors to the periphery mean it's all the same pub-sub tuple space.

Vladimir Vukićević spoke about canvas and SVG; more a putting a face to the implementer, since I've been playing with canvas for nearly a year, and SVG since last millennium. But all good, and there will also be OpenGL bindings for canvas in the future.

Henry S. Thompson, whom I've often enjoyed the wisdom of on xml-dev, gave a seminar on Efficient implementation of content models with numerical occurrence constraints. Transforming schemata to finite state machines, then adding counters which are incremented or reset on transitions, with maximum and minimum value guards. These allow you to transform large numerical constraints into simple state machines without having a very large number of states. Of course, this simply doesn't happen if you don't use a state machine for schema validation, which got me thinking and I ended up writing another little packrat parser. I'm sure you could create a packrat schema validator that wouldn't suffer the state explosion problem.

I hung around at the Mozilla reception for a bit, but was tired and cold-y, so went to bed early.

Thus ended the second day.




How fast is Java?

Link: http://blogs.sun.com/roller/page/dagastine?entry=java_is_faster_than_c

In response to David Dagastine's use of the SciMark numeric benchmark, where he finds that there's very little difference between in speed between Sun's JVM and native C compiled with Visual C++ 6, using a numeric benchmark that uses static methods to manipulate arrays of primitive data types, primarily doubles.

This agrees with my experience for numerical code. When I was lead software engineer in the technical computing group of the world's largest aerospace company, I moved several numerical projects from Fortran or C++ to Java. At the time, one the major reason were bugs in VC's optimisation code which meant the results were wrong, and discrepancies between release and debug builds that made C++ bugs harder to track, whereas Java had performance within 10% and much stronger guarantees of its results.

But MS got round to fixing the bugs, and in VS 2005 appear to have made some major gains in performance.

On my laptop (WinXP SP2, AMD Athlon 3400+), comparing Java 1.6.0-beta with VS 2005 express gives 388.67 and 631.12 mflops respectively, which is a much bigger difference than I observed between Java 1.4 and VS6. The SciMark code is portable C, so they don't use C++ intrinsics, which can give an order of magnitude speed up for certain code (though since it's working in double precision you probably wouldn't get that much improvement for quite a lot less readability, as you can only process 2 doubles at a time with SSE2, but can process 4 floats, and so keep whole vectors in single registers).

So I'd disagree that Java 1.6 is faster than C/C++ in this case on the two counts - firstly it's significantly slower than the C code it was benchmarked against, and secondly SciMark is not using current C++ techniques for optimising numeric code, which any serious numerical library would.

The SciMark code is optimised to a similar degree on both C and Java platforms, but the nature of the Java language means that there are further optimisations which are impossible (due to not having access to intrinsics or inline assembler), and many more that go against Java's intent of being a safe object oriented programming language (such as passing around char arrays rather than String objects).

I've observed before, that without any kind of meta-programming facility, writing optimised numerical Java is very painful (though I haven't had cause to try again - even with better function inlining, you have to copy all the parameters you need, and construct and object for any result which is more complicated than a single type, so it doesn't impact on the complexity much). For small projects, you can do it be hand, for others you have to either use a code generator from a higher level model, or just put up with it running slow. It would be nice if having to hard code optimisations disappear from Java code as Sun improves the JVM's capabilities. For example, copying all state out of an object into local variables, then performing a tight loop, then copying back and gives a measurable performance boost. The same applies for nested loops on multidimensional arrays - you hoist the reference to the row. Currently you have to do such tedious code by hand, though Fortress seems to favour such optimisations in its transactions, so may be a better way ahead for numeric code.

For something a little closer to Java's enterprise use, say date formatting functions, the difference between Java6 and C++ in small benchmarks is closer to a factor of 7 slower, for example see this thread.

However, what I learnt in that thread isn't that if you port C to Java it runs seven times slower (usually it's not as bad as that, and as the SciMark above shows - a good programmer can write Fortran in any language and it'll run quickly), but rather that even good Java programmers won't think of solving the problem in the faster way, but use Java idioms instead.

The idiomatic Java way of solving that problem was twenty times slower than the C code. If you're used to thinking that StringBuilder is faster than StringBuffer is faster than concatenating String, you won't write the sort of code that works fast, however good the VM gets. The granularity at which safety is assured in Java - immutable objects - means you can't optimise across method calls. You need to wrap up a string as an immutable object to protect it from modification, rather than proving its const via the type system.

Java is designed to be a safe, object oriented language. Fast code either operates above this level - for example, using a DSL specify that you want to multiple a matrix row by factor and it'll generate the code inline for it which works on the primitive types - or below this level, such as C can do, and the generated code would. Having to work only with objects with a statically typed interface means you don't have the flexibility to get a high level view of the logic, and get to feel the bits between your toes if you want to.

The granularity problem in Java also applies to array bounds and other checks - much of what can be known at compile time is lost by the Java type system. The attempt to fix this with generics didn't get far in terms of capability - you can only specify the generic type as far as the current Java type system allows, and the information is erased at runtime, so the JVM can't selectively remove redundant checks (other than checkcast). Traditionally, this hit you in loops of a known number of iterations against an array of known length – the array access instruction would still check the bounds. That one may have been mitigated, but only by means of specifically getting the JVM to look for that particular pattern. Traits allow you to provide a general mechanism. For example, a method which can only take a square matrix only needs to check its arguments if they are not known to be square at compile time, which depends where the method's being called from, so requires a backwards possible-caller analysis, and different code to be generated based on the results. Having an NxM matrix with a trait for squareness, also square matrix subclasses with the trait fixed, and knowing this trait doesn't change after the object is created would allow the check to be done both at compile and run time as appropriate. I'm not aware of any language yet that has that level of sophistication of traits inference. (I also don't like the compiler's lack of escape analysis for generics - if you've only ever put a Foo into a list, then you shouldn't have to tell it that it's a list of Foo. But that's for another time, as I'm not doing much Java programming in the real world now, and getting on to how traits can improve dynamic language performance is another post.)

So I don't think the JVM will ever be the fastest thing. Java's too much in the middle ground:

  • The language lacks the meta-programming facilities that hide the tedium of hand optimisations

  • The language lacks access to low level instructions that allow optimisations

  • There is no concept of a cross-object inference to prove unsafe operations cannot happen, rather it relies on information hiding to prevent them, which forces conservative restrictions and redundancy

  • Java idioms use object level safety and synchronised blocks, rather than traits and transactions. This prevents many possible optimisations based on direct access, thread-safe caching, and often requires redundant checks on information already known.

There were good reasons for these decisions in the evolution of Java, but the effects of them won't ever go away, and as long as it's the Java virtual machine, the JVM will have some compromise on its performance.

But it never was intended to compete with Fortran, and is fast enough for many uses.


Labels: , ,


XTech2006 - Tuesday

As usual, I don't get round to anything, but here's the notes I made when attending XTech 2006 last month.

The first day was an Ajax developer's day; I was there as I do quite a lot of UI work, this year I've shifted from Java front ends to using XulRunner; see earlier for some technical reasons (apart from having written a couple of million lines of Java over the last ten years and that's quite enough for the moment).

The Ajax developer's day had presentations from several framework vendors. As they are targeting web applications - professionally I'm employed to produce a framework for a suite of applications that sit on closed networks, we go as far as selling laptops with the software installed since the cost of a laptop is not significant compared to the software license, so it's rather a different model to Web2.0 - so some were interesting as examples of frameworks, but many Ajax frameworks simply exist to abstract away browser differences. Since we control our deployment platform, that's not an issue, but some of the showcased frameworks did quite a lot more.

The Yahoo! User Interface Library has some pretty effects, and compound controls for business type applications (calendars and so on). Personally, I prefer having a sensible date field like the one in DabbleDB, which accepts any unambiguous format and doesn't require clicking on, but with an option for a calendar as a sidebar.

Dojo and the future of the open web sort of want to standardize the api of the abstraction layer.

I looked at the OpenLaszlo platform a year and a half ago, and it seemed then and now to not quite be ready. It's a VM and interpreter for an XML language for constructing UIs. Currently, the interpreter is based on the Flash runtime. As a concept, it's interesting, but the execution was very slow on the corporate desktops of my previous employer, and the actual XML language loses many of the advantages of using a traditional XML pipeline such as separation of content, execution logic and presentation. The guy giving the demo apologized for the designer of the box model having been a MFC programmer who didn't know XML, which should be enough for anyone to be put off the product.

Hijax: Progressive Enhancement with Ajax was a solid presentation on enhancing a web application with Ajax, maintaining accessibility in a continuous integration style. Read the full paper if you're building anything on the web.

Developing Enterprise Applications with Ajax and XUL was closer to home - an intranet tool for a single client company - but the actual applications could be made by Dentrassi*. Soooo much grey, bad field alignment, no context, very Windows 95 VB.

Web 2.0 with Adobe Flex and Ajax seemed to involve installing a rather quantity of Adobe's software on your server, so sort of lost my interest. Filtered off my radar for the next couple of years.

As a bit of a language freak, I was looking forward to Combining E4X and AJAX, but Kurt Cagle's flight was too much delayed for him to give his talk. Instead, someone made an impromptu demo of some web-based IM clients, which were competently done but not particularly involving. But good on them for stepping in at the last moment.

AjaX with a Capital X! was an introduction to the BackBase framework. This is a framework using XML to define UI and data bindings. It was interesting, but was not quite what it says on the tin - putting method calls into XML syntax does not make procedural code magically declarative. Which is a pity, as the example introduced as 'you have to stop using declarative bindings when you need conditionals' could have been written in one line of XSTL2 (without conditionals, which are orthogonal to declarative/imperitive), rather than the several lines that the not-quite-declarative binding used. But they least know that they should be getting closer to pure functional code for anything they can, it's just a question of getting the right idioms out there into practice.

Unfortunately, having caught a bit of a cold, I went to bed early and didn't see anything of Amsterdam that evening.

More XTech days later.


*H2G2: The servants of the Vogons, excellent chefs, but who make tasteless food as they hate their masters.