Take Back Control. GUIs and other event driven programming without inversion of control.

GUI programming and inversion of control

Writing graphical user interfaces can be a tricky business. Typically we set up a set of event handlers. Each handler waits for its event and based on the current state moves the state to some other state. This arrangement is commonly called "inversion of control".

This is fine for simple GUIs, but once the system gets past a certain level of complexity it is rife with opportunities for error.  What if an event happens in a state that we didn't anticipate. What if we forget to disable events? Once we add in asynchronous IO such as HTTP requests, new complexities arise.

One solution is to carefully draw out a state machine --perhaps using state charts-- and use the state pattern to ensure that states are entered and and exited cleanly and that transitions only happen when we are in the correct state.

However state machines are at least one level of abstraction down from the usual structured programming design notations such as sequential composition (f followed by g), if-then-else, while-do. It's a lot like programming with only conditional goto statements.  Furthermore algorithmic abstraction (subroutines) are not available with state machines, unless you want to get in to pushing states on a stack. What's worse, state machines are a level up from the code you write. You might have a lovely state machine drawn on a white board, but once you've translated it to code all you have is an unstructured pile of code. Even if you keep that code well organized, for someone else to really understand that code, they'll need to translate it back to a nice picture. The solution is to keep the code and a diagram of your state machine in sync, but we all know that code and its documentation tend to drift apart. Furthermore, state machines do not correspond well to use-cases, which makes it harder to verify that your design meets all the use cases and only the use cases.

So, if you start from use cases, there are two design steps to take: translate from use cases to a state machine; translate from the state machine to event driven code.  Each step changes the paradigm and each is a potential source of error. Furthermore, if you are maintaining code, the first thing to do is to understand what it does. To reconstruct the use-case model from event driven code requires reverse engineering the same steps: translate from event driven code to a state diagram; translate the diagram back to use cases.

So my aim was to allow a more direct translation from use cases to code: a way to avoid inversion of control.

Reverting control

When my students first design GUI programs, I have them create sequence diagrams for their software. Often the diagrams come back looking something like this -- translated back into code:
    function main() {
        displayWindow() ;
        while( true ) {
            waitForUserToClick() ;
            reactToClick() ; }
This makes a lot of sense if you consider that, in all their previous courses, they've been writing console applications where input comes only when it is asked for.  It also makes sense if you consider that a GUI is --in essence-- a set of sequences of interactions and the student already know a language for expressing sets of sequences of actions, namely the ordinary programming constructs of sequential composition, choice, iteration, and function calls.

My contention is that these student, who might seem naive to those of us used to writing event driven GUIs, are on to something worth looking into.  Is there a way to realize a GUI without inversion of control?

The goal is to be able to describe the user experience in a simple and executable notation. Here is an example of such a notation
    displayWindow >
        await( click( b ) && reactToClick )
Here b is an object representing a button, displayWindow and reactToClick are functions representing changes to some model and/or view, and loop, await, click, > and && are functions and methods from the library I'll be describing in the next section.

The tbc library

The remainder of this post describes an experimental library named tbc (for "Take Back Control"), written in Haxe to explore these ideas. The language I use below is Haxe, but everything could be translated into just about any language that has lambda expressions and closures with lexical scoping, for example JavaScript, Scala and many many others.

The library is structured as several modules, so far these are tbc.TBC, tbc.TBCTime, and tbc.TBCHTML. Unless otherwise mentioned all names are exported from either the TBC module, or are static functions of the TBC.TBC class. It can be found at https://github.com/theodore-norvell/tbc .

I've been thinking in terms of web development, which is why there is a  TBCHTML, but one could also have a TBCSwing or some such. One of the nice things about Haxe is that it can be translated into lots of useful languages such as JavaScript, Java, C++, and so on. The core TBC module is target independent -- can be used regardless of the target language; TBCTime works with some targets but not all; TBCHTML is specific to JavaScript.


Central to the library is a generic data type, Process<A>, which is the type of all processes that compute values of type A. For example an object of type Process<Int> is an object that can compute an integer value.  Executing a process may cause side effects to happen; side effects include changes to state, output, input, and delays; delays include waiting for input and waiting for time to pass.

We can execute a process using its .go method. This method causes the process to execute, but there is nothing in the semantics of .go that says that it will finish immediately, it might finish after some time, and, in particular, after some interaction with the user.  The argument of .go is a void function that takes an A. For example if p is an object of type Process<Int> then
    p.go( function( x : Int ) { trace(x) ; } ) ;
will cause p to start to execute and, when it has finished, the Int it produces is sent to the trace routine. The argument
    function( x : Int ) { trace(x) ; }
is an anonymous function or lambda expression.  The trace function is a standard Haxe function for outputting a message to a log.

One particularly useful process is unit(x), which is a process that computes value x immediately with no delay and does nothing else. So
    unit(42).go( function( x : Int ) { trace(x) ; } ) ;
is a complicated way to output 42 to the log.

As an abbreviation skip() stands for unit(null). skip() has type Process<Triv> and Triv is a type that has null as its only value. So skip() is a process that has no side effects and computes an informationless value.

If f is a function of type Void->A (i.e., a function with 0 arguments that returns a value of type A), then exec(f) is a Process<A> that executes by calling f, and the result of the process is the result of f. The exec function is useful for inserting side effects, such as changing or reading global state, into the computation.

Another kind of side effect is to spend time doing nothing. In the TBCTime module, there is process function pause so that pause(n) is a Process<Triv> that completes after about n milliseconds.

We can combine processes in a number of ways.  p.sc(q) is a process that, when executed, executes process p, throws away its result and then executes process q; the value computed by p.sc(q) is the value computed by the execution of q.  As an example
    pause(1000).sc( exec( hello ) )
is a process that executes function hello after a delay of 1 second.

You can think of sc as standing for either "sequential composition" or for "semicolon". Sequential composition is associative
    p.sc(q.sc(r))  =  p.sc(q).sc(r)
And skip() and unit(x) (regardless  of x) are left units of sc
    p  =  skip().sc(p)  =  unit(x).sc(p)
Furthermore skip() is a  right unit
    p  =  p.sc( skip() )
provided p is of type Process<Triv> .

As a convenience the > operator is overloaded to mean sequential composition. So
    p > q > r
    p.sc( q ).sc( r ) 
The laws are
    (p > q) > r  =  p > (q > r)
    p  =  skip() > p  =  unit(x) > p
    p  =  p > skip()
provided (for the last law only) that p is of type Process<Triv> .

Besides sequential composition, we can imitate other sequential programming constructs. A process ite( p, q, r), where p is of type Process<Bool> and q and r are both of type Process<A> (for any type A) is a Process<A> that mirrors if-then-else statements. And loop(p) is a Process<Triv> that is equivalent to an infinitely repeated sequential composition of p with itself. And so on.

A useful generalization of sequential composition is bind. The problem with sequential composition is that it throws away the result of the left process. Bind solves this problem. If  p is a Process<A> and f is a function from type A to type Process<B>,  then p.bind(f) is a Process<B>. To execute p.bind(f), first p is executed, then its result is fed to f, finally the resulting process is executed, delivering the result. bind is the familiar bind (or flatmap) operation from monads.  As an abbreviation p.bind(f)can be written by p >= f. Bind and unit obey the usual monad laws
    (p >= g) >= h  =  p >= function(y) { return g(y) >= h ; }
    p >= unit  =  p
    unit(x) >= f  =  f(x)
provided that y is not free in either g or h. The first law is not used much in practice exactly because often y is free in g or h. People tend to use bind as a combination of sequential composition and local variable declaration. So you might see code like this
    p >= function( x ) { return
    ...x... >= function(  y ) { return
    unit(...x...y...) ; } ; }
This is so common in Haskell that there is a special notation for it
    do x <- p
       y <- ...x...
       return (...x...y...)

If we were to define the operation f*g to mean function(x) { return f(x).bind(g); } (where x is not free in g), then the three laws concerning bind can be written in a very familiar way
    (f*g)*h  =  f*(g*h)
    f*unit  =  f
    unit*f  =  f 
In practice >= it is easier to use than *, and so we define >= but not *.

Much has been written about monads elsewhere, which means I don't need to write much here.  I'll just note that p.sc(q) is defined in terms of bind to be
    p.bind(function(x) { return q ; } )  
And so
    (p > q) >= f  =  p > (q >= f)
    (p >= f) > q  =  p >= function (y) { return f(y) > q; }
provided that y is not free in f or q. Furthermore loop(p) is defined to be
    p.bind( function( a : A ) { return loop(p) ; } ) 

In Haxe, >= and > are at the same level of precedence and associate to the left. For example
    p > q >= f > r >= g
    (((p > q) >= f) > r) >= g
which means

[Note for Haskell programmers: Just remember that Haskell's >> is TBC's > and Haskell's >>= is TBC's >=; i.e. drop a > from the Haskell notation to get the TBC notation.  There is probably some way to use Haxe's macro system to get something like Haskell's do notation, but I don't know Haxe's macro system (yet).]


If p is a Process<A> and q is a Process<B> object, par(p,q) is a Process<Pair<A,B>> that, when executed, executes both p and q, possibly interleaving their various side-effects. Whenever both are delayed, either one may be the next to proceed. Note that everything actually runs on one thread.

For example
            exec(outX) > pause(2000) > exec(outX)
            exec(outY) > pause(2000) > exec(outY)
is a process that executes functions outX and outY twice each. The first outX and outY are immediate, while then latter two are executed about 2 seconds later. In theory, any of the interleavings
    outX outY outX outY
    outX outY outY outX
    outY outX outX outY
    outY outX outY outX
are possible.

Fork-and-join parallelism is also possible, although I haven't implemented it yet.

Guards and Guarded Commands

An object of type Guard<E> represents some category of input events that might happen in the future. For example, if b is a button element on a web page, then click(b) is a guard representing clicks on the button. click is in the TBCHTML module. The type parameter E is the type of objects representing information about the events, for example the coordinates of a mouse click. The type click(b), if b is an HTML button element, is Guard<Event>, where Event is the type of DOM events in HTML.

If g is a Guard<E> and function f is of type E -> Process<A>, then g.guarding(f) is a GuardedProcess<A>. A guarded process might be thought of as a process that starts with an event.

A method await turns a set of guarded processes into a process. So if m0 = g0.guarding(f0) and m1  = g1.guarding(f1) are guarded processes of the same type, then await(m0,m1) is a process. The process await(m0,m1) is executed as follows: First time passes until one of the events associated with m0 or m1 happens. Say an event that g0 represents happens, then execution continues with the execution of f0(e0), where e0 is the information associated with the event.

When the process await(m0,m1) begins, the event categories associated with m0 and m1 are enabled. In the implementation, this may mean installing event handlers and changing the appearance of GUI widgets. When one of the events fires, the event categories are disabled before execution proceeds with the corresponding guarded process. If both m0 and m1 are triggered by the same event, only one of them proceeds; which one is up to the implementation.

Here is an example of guarding
            await( click( b0 ).guarding( out0 ) ) >
                click( b1a ).guarding( out1A )
                click( b1b ).guarding( out1B )
            ) >
           await( click( b2 ).guarding( out2 ) )
Here the b0, b1a, b2a, and b2 are button elements on an HTML page and out0, out1A, out1B, and out2 are functions returning processes that make changes to the page.  The interaction with the user proceeds as follows.
  • b0 becomes enabled.
  • When b0 is clicked, it becomes disabled and buttons b1a and b1b become enabled.
  • When one of these two buttons is clicked, both become disabled and b2 becomes enabled.
  • When b2 is clicked, it becomes disabled and the process continues from the start.
The use case scenario corresponding to this process is
  1. The user clicks on b0
  2. The system makes change out0
  3. The user clicks either b1a or b1b
  4. If the user clicked on b1a
    • The system makes change out1a
  5. Else the user clicked on b1b
    • The system makes change out1b
  6. The user clicks on b2
  7. The system responds with change out2
  8. Back to 1.
Of course a real use case would explain what out0 etc are, but none of that matters to us. The point is the close correspondence between the use case and the code.

The >> operator is overloaded to mean guarding, so the last example can be written as
        await( click( b0 ) >> out0 ) >
            click( b1a ) >> out1A
            click( b1b ) >> out1B 
        ) >
        await( click( b2 ) >> out2 )

In some case we might want to throw away the event information. This is easily done using
    g >> function( e : E) { return p ; }
which is a GuardedProcess<A> if g is a Guard<E> and p is a Process<A>.  For example we might want to simply wait for a click on button b. This Process<Triv> will do that
    await( click(b) >> function( e : Event ) return skip() ; } )
As a convenience, guards have a .andThen method to handle this situation less verbosely. We can write
    await( click(b).andThen( skip() ) )
or, using operator overloading,
    await( click(b) && skip() )
Since && has lower precedence than > and >= we can write
    g && p > q >= f
which will be parsed as
    g && ((p > q) >=f) 

The .sc and .bind methods can also be applied to guarded processes on their left. Thus
    g >> m >= f
which is parsed as
    (g >> m) >= f
means what you would expect: when g is fired, the event is sent to m and then the result, a, of executing m(e) is sent to f and the result of executing f(a) is the result of the whole thing. The following associativity law.
    (g >> m) >= f  =  g >> function(e) { return m(e) >= f ; }
holds provided e is not free in m or f. Likewise
    (g >> m) > p  =  g >> function(e) { return m(e) > p ; }
holds provided e is not free in m or p. Furthermore
    (g && p) > q  =  g && (p > q)
    (g && p) >= m  =  g && (p >= m)

[Note on the notation. The && operator was chosen most because of its low precedence.  Furthermore | could combine guards, while || could combine guarded processes. So it is nice that && has lower precedence than | and higher precedence than ||.  (But note that >> has higher precedence than |.)]

[Another notational note: It is tempting to use > for .andThen and >= for .guarding.  Advantage: fewer operators to remember. Disadvantage: this notation obscures the types, making code less readable. For example, if you read a && p, it is clear that a is a guard and the result is a guarded process. But, with the alternate notation, if you read a > p, the type of a could be guard or process or guarded process and the type of the result can only be determined after determining the type of a. Also, with the alternative notation, skip() is not always a right unit of >, nor is unit always a right unit of >=.]

Guards can also be used for time events. The timeout guard fires if it's been enabled for more than a certain amount of time. For example, if the user fails to respond in a timely manner, we might want to move on. Process nag(null), defined like this
    function nag(triv : Triv) : Process<Triv>{ return
                click( b ) && exec(thanKTheUser)
                timeout( 1000 ) && exec(nagTheUser) >= nag
        ) ; }     ,
nags the user every second until the button is clicked.

In fact pause( t ) is defined by
    pause( t )  =  await( timeout( t ) && skip() )

Summary of the core operations

Suppose that
  • p is of type Process<A>
  • q is of type Process<B>
  • f is of type A -> Process<B> 
  • h is of type A -> C
  • g is of type Guard<E>
  • m is of type E -> Process<A>
  • gp, gp0, gp1, ..., gpn are of type GuardedProcess<A>
Meaning Syntax Type Operator syntax
Do p, then do q. p.sc( q ) Process<B> p > q
Do p, then do f(x), where x is the result of p. p.bind( f ) Process<B> p >= f
Make a guarded process that does p when fired. g.andThen( p ) GuardedProcess<A> g && p
Make a guarded process that does m(e) when fired, where e represents information about the event. g.guarding( m ) GuardedProcess<A> g >> m
Make a guarded process similar to gp, but that does that does q after gp has executed. gp.sc( q ) GuardedProcess<B> gp > q
Make a guarded process similar to gp, but that does that does f(a) after gp has executed, where a is the result of executing gp. gp.bind( f ) GuardedProcess<B> gp >= f
Fire one of these guarded processes. await( gp0, gp1, ..., gpn ) Process<A>
Do p again and again loop( p ) Process<Triv>
Do both p and q. par( p, q ) Process<Pair<A,B>>
Start process p. p.go( h ) Void

Related ideas

When I first looked into this a few years ago, I found Martin Odersky and Philipp Haller's 2006 paper, "Event-based programming without inversion of control", which introduced Scala's Actors library. The paper mentions user interfaces as an area for future work.  Recently I came back to this question, and the above documents some preliminary experimentation.

In addition to Odersky and Haller's actors, some of the inspirations for TBC include process algebras, such as Tony Hoare's CSP and Robin Milner's CCS; (extended) context-free grammars; parsing combinators.; monads as used in Haskell and more recently in other languages such as JavaScript; JavaScript's promises; and the command pattern as documented by Erich Gamma and company.

Further work

I'm happy with the basic abstractions of process, guard, and guarded process. However, the above is just an early exploration. There is much more to consider including
  • Internal events for communication between parallel processes. Processes can already communicate via shared global state. However this does not allow synchronization between processes. For that and to avoid using shared state, synchronous messages, asynchronous messages and rendezvous are the classic approaches.
  • Communication with external entities other than the user. In the HTML context, this includes XML HTTP requests.
  • Failures and recovery.
    • One approach is that exception catching can be emulated by means of a second version of bind. While bind says what to do next in case of success, onFail says what to do in case of a failure. We give processes a second type parameter so that Process<A,B> is a process that succeeds with a value of type A or fails with a value of type B. If p is of type Process<A,B> then and f is of type A -> Process<C,B>, then p.bind( f ) is of type Process<C,B>. Failure can happen either within p or within the result of f.  Symmetrically,  if g is a function of type B -> Process<A,D> then p.onFail( g ) is of type Process<A,D>. Here if p fails, there is a chance of recovery, should the result of g succeed. This is just like try{ p }catch( e ){ g(e) }. Note the complete symmetry between succeeding and failing.
    • Those familiar with JavaScript's Promise type will note that there is little original here.
    • Those familiar with monads will note that Process<A,B> is monad in two ways. I.e. we can fix either A or B to get the functor part of a monad.
    • The problem is that I'm not a big fan of try-catch, so before going down this road, I'd like to think about alternative ways to deal with failure and recovery.


A simpler RMI tutorial for Java

Oracle corp has a tutorial on RMI which is quite involved. There are 4 different kinds of processes working together so that two of them can talk: A server, a client, an rmiregistry process, and a web server to serve class files.  There are also security policies for both the server and the client.

My purpose here is to strip RMI down to the simplest possible set up. In particular

  • No security policy is used.
  • All code is in copies a single jar file.
  • There is no dynamic code loading.
  • No separate rmiregistry process is needed.
This brings the number of kinds of processes down to 2, reduces the number of command-line options needed to zero, and reduces the number of security policy files needed to zero. It also simplifies distribution.

There are no doubt some advantages to using an external registry process and dynamic code loading. However for many applications they just aren't needed.

The server code

My example code is an Othello game. The main server class is called RemoteGameModel. This class records the state of the game; I used the observer pattern to alert observers of any change to the state of the game. The observers are typically running in other JVM instances, possibly on other machines.

The hookup of the observers to the RemoteGameModel is initiated by the observers; they will look up the RemoteGameModel using a registry. Thus after creating the RemoteGameModel in the server we need to register it in a registry running at a known location.  This is done with a few lines of code in a main method in class ServerMain:

First we create the observable model. The RemoteGameModel classs extends UnicastRemoteObject, which means that as part of being constructed, the object is exported. Being exported is important since it means that when the object is passed to a remote object, a stub will be made for it and passed instead.

Second, we need a registry. Rather than using an rmiregistry process, it is easier to create a registry in the same JVM as the server object resides. This means the registry and the server must be on the same host. The port number used should be unique to your application.

Third, the RemoteGameModel is registered with the registry.

The next two lines report the URI that can be used to refer to the RemoteGameModel object.

There are a few things that can go wrong, so it's important to wrap the above code in try-catch.

The client code

On the client side, we need to obtain a reference to the RemoteGameModel which is on another machine. In RMI we ask the registry and the registry sends back a proxy (stub) object representing the server. The stub implements the RemoteGameModelInterface, just as does the actual RemoteGameModel.

The following code is in the main method of class ClientMain:
First set a string to be the URI for the object.

Then the static method Naming.lookup returns the stub.

Once we have a reference to this object we can use it to send messages to the server object. In my code I use it to build an observer in the client. I pass the reference to the proxy into the constructor of the observer object. The last thing the observer does is to send addObserver(this) to the proxy. As the observer object's class extends UnicastRemoteObject, this sends a proxy for the observer over to the servers machine, where it is added to the server's list of observers.

Deploying and running the code

The code can now be tested simply by starting a JVM process running ServerMain.main and a number of processes running ClientMain.main.

To make deployment easier, it is a good idea to put all files in a .jar file. I called mine othello-2.jar. Then the server can be started from the command line with the command
java -cp othello-2.jar othello.ServerMain
And the clients can be run from the command line with the command
java -cp othello-2.jar othello.ClientMain

Further information and source code

Further information about the techniques used in the Othello project can be found at  http://www.engr.mun.ca/~theo/Courses/sd/notes.html under "Animation with Threads" and "RMI and remote, concurrent observers". The source code at http://www.engr.mun.ca/~theo/Courses/sd/links.html shows the project at various stages of development:

  • Othello-0: Thread based animation.
  • Othello-1: Adds concurrent observation and multiple view stragegies.
  • Othello-2: The project is converted to use RMI.
  • Othello-3: Adds input and game state to make a playable game.