Asynchronous is not "Fire and Forget": Joiners not Quitters
In all the recent enthusiasm for threaded and asynchronous programming, I’ve noticed people missing something really important: if you fire off an asynchronous task, you will eventually need to wait for it to complete — and you’ll probably need to be able to cancel it as well.
At the basic level, this is why Ruby’s
Thread has instance
methods such as
#kill, specifically, is such a blunt instrument that you
should never use it in production code). For the sake of language- and
library- neutrality, I’ll call these two sorts of operations
These are very important. Let’s begin by looking at
When one task joins another, it blocks (or defers its execution, if you prefer) until the other task completes. After the join completes, the joining task can be assured that the other task has really finished, that all of its side-effects have been accomplished, and that its result is available.
Essentially every thread API provides some sort of join operation, whether
we’re talking about
waitpid() for POSIX processes,
pthread_join for POSIX threads,
java.lang.Thread.join for Java threads,
Thread#join in Ruby,
threading.Thread.join in Python, and so on.
Many of these methods also provide a way to get a status or result
value from the thread. In Ruby,
Thread#value does this.
In some cases the notion of joining is less explicit, but still present.
For example, to join another Erlang process you would take advantage of the
trap_exit flag and wait to recieve a message indicating that
the process in question has exited. In the context of a node.js program,
you will usually set up a callback function to be invoked when the
asynchronous operation completes. This callback frequently ends up serving
as the continuation of the logical task that set it up. It’s all the
same basic pattern.
Why is this important? Let’s look at what happens when API designers drop the ball.
cake is a Lie
My first example has to do with CoffeeScript’s
cake is similar to tools like Ruby’s
make, all of which let developers define a set of
named tasks (typically refreshing automatically generated files) which can
easily be invoked from the commandline. In the case of
make, tasks which depend on other tasks are handled by
explicitly indicating the dependency in a declarative way:
# make foo: @echo Make ALL THE THINGS bar: foo @echo Use ALL THE THINGS
# rake task :foo do puts "Make ALL THE THINGS" end desc "Make all the things" task :bar => [:foo] do puts "Use ALL THE THINGS" end desc "Use all the things"
In the case of
cake this has to be handled a little more
# cake task "foo", "Make all the things", -> console.log("Make ALL THE THINGS") task "bar", "Use all the things", -> invoke "foo" console.log("Use ALL THE THINGS")
This is probably fine in itself, since
cake doesn’t aspire to
support the sort of file dependency handling that
make do. However, while logging to the console is synchronous,
and node provides some synchronous APIs for working with the filesystem,
what happens if the
"foo" task ends up needing to to something
asynchronous? Most node APIs are asynchronous by design, after all.
makeAllTheThings were an asynchronous operation which
took a callback, you’d expect to be able to do something like this:
task "foo", "Make all the things", (options, cb) -> makeAllTheThings(cb) task "bar", "Use all the things", -> invoke "foo", -> useAllTheThings()
But nope. Chuck Testa.
even accept a callback argument. The best you could do without breaking
out of the
cake DSL would be something like this:
task "foo", "Make all the things", -> makeAllTheThings(->) task "bar", "Use all the things", -> invoke "foo" useAllTheThings()
But (even if it happens to work on your machine most of the time) this doesn’t guarantee that all of the things will have been made before you try to start using them. What joining does for you is establish a constraint: none of the following things should begin before this task is done.
There’s an important insight to be had here: Strictly sequential code is typically overconstrained, unnecessarily waiting for one thing to complete before moving on to the next, even if the next doesn’t really care about its result. This doesn’t cause bugs, but it does result in missed opportunities for performance. Asynchronous code is underconstrained by default, letting you take full advantage of opportunities for paralleism (in IO if not in computation), but you have to explicitly add some sequencing constraints back in if you want things to work reliably where order matters.
In the end, with
cake’s existing API, you have to do something
foo = (options, cb) -> makeAllTheThings(cb) bar = (options) -> foo options, -> useAllTheThings() task "foo", "Make all the things", foo task "bar", "Use all the things", bar
In order to safely compose asynchronous tasks, you have to step outside the boundaries of the task machinery altogether; you can’t compose them the same way synchronous tasks can be. API fail.
(To be fair to Jeremy, this is more of a node fail than a
fail; there isn’t a completely nice way to make this work with
Ideally what you would want is for
invoke to automatically join any
asynchronous things spawned by a task, the same way a top-level task invocation does,
but there’s no node API which would permit implementing that.
is pretty much a promise that can never really be kept; it was almost omitted from
cake API altogether, and the above non-DSL approach is probably the
In general, if you fire off an asynchronous task and never join it anywhere (note that node/cake themselves wait for tasks before exiting), then you’re doing something wrong. If you care about the result of a task’s computation, then someone eventually needs to wait for the task to complete in order to be sure that it’s finished computing everything it’s supposed to. (If you don’t care about the task’s result, then why are you sparking it in the first place?)
This is especially important with unit tests. Any test involving an asynchronous implementation needs to have a way to join every important asynchronous operation the code under test sparks off before it can check assertions. Otherwise, you can easily end up writing tests which happen to pass on your development machine (most of the time), but intermittently fail in CI after they’ve already landed, perhaps when someone else’s legitimate changes happen to arbitrarily influence the timing.
These kind of issues can be much worse when the fail is buried several API layers deep. In fact, this is a common architectural problem in asynchronous libraries in general. (Twisted Python seems to be a particularly bad offender. For example, the in-built name resolver will fire off requests and simply leave them out there, sometimes causing the trial’s “reactor clean” check to fail in tests. Even if you’re just resolving “localhost”.)
Next time: Cancel THIS!