Wednesday, September 24, 2008

The Omnidebugger

We intend to use a number of different programming languages in the STEPS project. Even right now, if we only look under the hood (in what we call the "engine room"), we're already using three: Pepsi, Coke, and an OMeta-like language for parsing. Coke plays a special role among these languages, since the semantics of the other languages are defined via translation into Coke.

One thing programmers—especially Smalltalkers—really care about is debugging. Coke has recently started to get some debugging capabilities, and we can expect that these will soon be on par with (hopefully even better than) Smalltalk's. This is definitely a good thing, but we can't stop there; the other languages also need support for debugging, and making this work in under 20K LOC is not a trivial task.

Consider a JavaScript implementation on top of Coke, for instance. Our translation might map a single JavaScript statement to a group of three or four Coke expressions that must be evaluated in sequence. So if we use the Coke debugger on the code generated for a JavaScript program, its notion of “single-stepping” won’t make sense at the JavaScript source level. Similarly, inspecting the temporary variables on the stack won’t work unless JavaScript’s temps are represented directly as Coke temps, which may not be the case. Things are even worse for languages whose semantics are significantly different from Coke’s. For example, a debugger for Prolog should support all kinds of features (e.g., unification) that don’t really make sense in the Coke debugger.

The “conventional” way around these problems would be to implement a separate debugger for each language, which clearly isn’t good enough for STEPS. But what if we went with a kind of pluggable debugger architecture that allows each “Language X”-to-Coke translator to associate inspecting and debugging functionality, along with all kinds of useful meta-data, with the Coke parse trees it generates? (The Coke compiler would of course have to maintain these associations when it converts the parse trees to code.) This would enable a single debugger implementation, including its GUI, to customize itself to the language that is being debugged. It would also enable programmers to debug the same piece of code at different levels of abstraction (e.g., at the JavaScript level, hiding all “scaffolding” or at the Coke level, in gory detail), which would be extremely valuable to language implementers.

Scheme programmers often implement little DSLs using macros, so I hope that we can get some inspiration from Dave Herman's debugging library for PLT Scheme, which seems very interesting.

Thursday, September 11, 2008

Worlds: Controlling the Scope of Side Effects

Here is a (very informal, not conference-style) paper about worlds, which is something that I've been working on lately. I'm pretty excited about this stuff... Please let me know if you have any comments, suggestions, etc.

Tap, tap, tap — is this thing on?

Update: Chapter 4 of my dissertation is an improved version of this paper. It contains more examples, a formal semantics for property/field lookup in the presence of worlds, and a proper Related Work section.

Saturday, September 6, 2008

Modified V8 Shell w/ translateCode

This week Dan Ingalls came down to visit us at VPRI, and we got to chat about V8 for a little bit. We're both really excited about it. "It's like they just gave you a new horse", he said. :)

So tonight I was looking at the source code of the V8 shell, and it turned out to be pretty easy to modify it to support my translateCode idea. In case you don't know what I'm talking about, there really isn't much to it: translateCode is just a user-defined function that's called (implicitly) by the shell/workspace. It takes as an argument the code that the user typed in, and returns the code that should run in its place. It's extremely useful for playing with source-to-source translators. Here's a link to my modified shell.cc, if you're interested.

The following transcript shows this modified shell in action. ometa-rhino.js is just a file that loads the OMeta/JS implementation and defines translateCode just like I do in the browser, i.e., so that both OMeta and JavaScript are accepted.
  ./v8 ometa-rhino.js --shell
V8 version 0.3.0
> ometa M { ones = (1 -> 2)* }
[object Object]
> M.matchAll([1, 1, 1, 1], "ones")
[2, 2, 2, 2]
Now if only these guys at Google would hurry up and get Chrome to work on my Mac...

Saturday, August 30, 2008

asserts and retracts with Automatic Rollback

Lately I've been working on a paper about worlds—more on this soon—and while I was looking around for related work I came across something really interesting.

Consider the different ways of representing state in Prolog. First, there's the functional way, in which the state of the entire program is passed around from predicate to predicate in the form of additional arguments. Although burdensome, this approach interacts well with backtracking: when a failure occurs, changes to the program's state are automatically rolled back.

Another way to represent state in Prolog is to use the built-in database manipulation predicates assert and retract. This imperative approach has the advantage of not requiring predicates to take additional arguments, but it unfortunately does not interact well with backtracking. This is because assert and retract are destructive operations, i.e., they survive backtracking. So when a failure occurs, the programmer must make sure that each assert is retracted and vice versa, which is not easy to get right.

If only the programmer had access to special variants of assert and retract that automatically undo their side effects when backtracked over, the imperative approach would enable some really interesting programming styles that you could never get away with in a "real" imperative language. Interestingly, Tim Menzies shows us how to implement these variants in just a few lines of standard Prolog. (I had never come across this trick before, but Tim's nonchalance makes me think that it's well known in the Prolog community.)

Here is a variant of the assert predicate that automatically retracts itself upon backtracking:
  assert2(X) :- assert(X).
assert2(X) :- retract(X), fail.
The first clause above makes assert2 behave like a regular assert if the query in which it is used succeeds. But if the query fails, the program will eventually backtrack to the point where assert2 was used and try the second clause, which retracts the fact asserted by the first clause.

Similarly, we might define retract2 as follows:
  retract2(X) :- retract(X).
retract2(X) :- assert(X), fail.
But consider what happens if retract2's argument is not already in Prolog's database. The first clause (which uses retract) will fail, so the program will try the second clause, which will assert a fact that was never retracted in the first place! We can fix this problem by refactoring retract2 into two predicates, as shown below:
  retract2(X)       :- X, reallyRetract2(X).
reallyRetract2(X) :- retract(X).
reallyRetract2(X) :- assert(X), fail.
And that's that.