Tin Can Camera: 2006-11

kin - an experiment in high performance scripting

Kin is the name for a dynamic language I've been toying with for some time. There's a SourceForge project, http://kin.sourceforge.net, which dates from when I was targeting the JVM. I may continue with that, or just put chunks on tincancamera.

The main aim of kin was to write engineering applications more easily. This means, in the tradition of smile (not sure of the name), the z/os scripting language used at BAE for tying together its Fortran stress routings, and more recent efforts such as SciPy (or arguably Swig as a meta-case), kin was intended to allow scripting of numeric libraries.

I'm getting interested again in kin as a base for experimenting with efficient interpreting of JavaScript (as I can't be bothered writing a language right now).

This has a few effects on the virtual machine-

The target machines are what would be considered server or workstation class at BAE. Kin is to target processors with 1 GB RAM, 128bit SIMD architecture and 64 byte cache length. This is the spec. of my little laptop, the lowest machine that not a dog.
Efficient FFI (foreign function interface). Swig may help for the boilerplate, but in general foreign_foo(foreign_bar()) should cost no more than it does in kin as in C.
Traits and mixin based inheritance. A mixin transforms a class into a sub-class. Traits extract a set of locations that let the code generator efficiently access the slots of an object - if a function calls var a:int = foo.x and foo.bar(a + 4) then foo has traits {a:fix_int, bar:((integer)):*}, so the offsets to these slots in the first encountered foo value may be cached and the calls inlined, and any further objects mapped to these offsets. Mixins are functions which add traits to a type, and allow either class or prototype based inheritance. If each object slot table is a form of dynamically extended hash and mixins are arranged to be monotonic over the index of the slots in the hash, then usually the offsets of the slots referenced by the traits inferred by the use of an object in an invocation should be stable. Or that's one of the things kin will be attempting to investigate. And if they're stable, they can be hoisted out of loops too.
Escape analysis. For both performance and concurrency reasons.
To support type inference and relational programming, a simple prolog core. Which also will be available at the programming level so kin can use it for declarative coding.
Something on concurrency. I need to grok Erlang properly to have an opinion on message passing vs shared memory, but if using shared memory, then at least supporting transactions simply so an object is only released to other threads in a consistent state. If escape analysis shows the state isn't shared, the transaction should be a no-op; otherwise there may need to be a copy and lock of the object's state - which again would use the analysis of what traits of an object a function exploits.
SIMD array comprehensions, big_num, etc. The Java kin had a fast big_num for large numbers; the lower-level SSE operations should make it of the order of four times faster. The problem is getting the operations up to a level where I can write a full efficiency division algorithm without getting lost, which I never managed in Java.

So that's the sort of thing kin's a test bed for. The relation side used to be more stressed (it was functional + prolog + numerics, built on a small lisp compiler core), hence the name.

TME

SSE String

(to rhyme with silly string)

Since I'm not seeing the fine lass this weekend, I've spent some time twiddling my bits - playing with applying SSE2 to string algorithms.

SSE2 operations shift the costs of certain operations drastically, turning O(N*M) to O(N) for M <= 16, and making register comparisons far faster than loads. I've played with algorithms for searching for a single character in a string and for searching for a substring. The platform is WinXP, AMD 3400 mobile. Applying SSE to the naive algorithm for finding a character - loop and compare each - runs around 5.3 times faster; using _mm_cmpeq_epi8 to compare 16 bytes at a time.

Applying SSE to the naive algorithm for finding a substring - find the first character in the substring, then test the remainder - gives a 2 to 5 times speed up (best case is where the first character never occurs in the string) using SSE to match both the first character (compare a block of input with the first character 16 times in parallel), then to compare with the remainder (compare the current input to the pattern 16 bytes at a time).

Using Boyer-Moore on the same input gives results usually between the two. Attempting to use SSE for the inner part of Boyer-Moore doesn't help - extracting the index of the first mismatch from the 16 bit pattern indicating which characters in the pattern match the input costs more than the non-SSE version.

Boyer-Moore-Horspool is faster than Boyer-Moore, and faster than naive+SSE for longer strings, but its simplifications don't seem to be enough to admit any obvious parallelisations. The speed advantage over Boyer-Moore is probably that the algorithms are implemented as functions, and Boyer-Moore has to allocated and free an array on each call. So the performance would be different for a pattern matcher object.

Knuth–Morris–Pratt seems to under perform on these tests, again probably due to the dynamic memory.

I need to think about SSEifying Boyer-Moore-Horspool and Knuth–Morris–Pratt, and create another test harness that allows for pattern objects so the initialization of the tables in the more complicated algorithms doesn't count against them. But in a single function use, it appears that naive is best when the first character is rare, and Boyer-Moore-Horspool where the string is longer. Rareness also helps the algorithms with dynamic memory, as each is called until all the input is scanned, and the rarer the pattern the fewer the times the function is called.

Before I do that, I'll get the core object system in kin working (see next post) - I'm just about happy to go on with that now.

TME

Labels: kin, parallel, programming

Current reading

Following the announcement of Mozilla Tamarin - the adaptation of Adobe's ActionScript virtual machine to give JIT compilation of JavaScript in Gecko applications, and the starting up of the SmallTalk.rb project to get Ruby running on a better VM, I've been doing a bit of reading:

StrongTalk - fast, typed smalltalk
- interestingly, unlike ActionScript, the typing is orthogonal to the speed optimisations
- everything is implemented as mixins, functions which add features to classes

Mirrors - a mechanism for separating reflection and conventional object APIs

JS Typechecking via JS-Sourcerer or a more formal approach which I don't know half the notation of
- JS-Sourcerer doesn't have a model for the interfaces to XUL, and doesn't accept IDL to define it (though it should be possible to create an IDL to its proprietary type definition language)
- there's also a more complete modal type model; I need to investigate more of the theory here, and here.
- at the least you can infer types at any execution point and then assume that changes to object's or prototypes relaxes assumptions
- I need to find whether most type inference systems for dynamic languages are Prolog-like, and whether a Rete-like system is more suitable for dynamic languages
- assuming the cost of the inference actually is worth it in execution terms; it may be better to do some pattern matching then allow constraint violations to push execution back to the interpreter. So choosing an inference engine which is suited to the pattern matching may be more important.
- Type Feedback vs. Concrete Type Inference - maybe TI in the bytecode generator, then TF in the VM?

And, in other news, there's the re-branding of the Semantic Web as Web3.0. Personally, I've been using Web3 for community driven 3D on the web. Let's see if RDF hacking get as popular as World of Warcraft machinima or Virtual Life bartering.

TME

Labels: javascript, mozilla, programming, virtual-machine, web3

Life Among The Mammals: The Elves and the Shoemaker

Link: http://lamammals.blogspot.com/2006/11/elves-and-shoemaker.html

Life Among The Mammals: The Elves and the Shoemaker

Amen

Labels: god, personal

Tin Can Camera

2006-11-20

kin - an experiment in high performance scripting

2006-11-19

SSE String

2006-11-14

Current reading

2006-11-07

Life Among The Mammals: The Elves and the Shoemaker

About Me

Previous Posts

Blog Roll

Archives