2019-10-02

Spooky Exit At A Distance

I am personally opposed to async, futures, promises; whatever you call it. It is almost never appropriate for application or library development, yet widely proposed as a good solution to problems. It also has an almost amusingly terrible history of integration and transition into ecosystems. I plan to explain my complaints properly in a future post.

But, we still use it. Let's look at a specific example, in node, which I call "Spooky Exit At A Distance".

Here, we have possibly the simplest asyncnode application, with the "logging prelude" we're going to be using:

async function main() {
  return 5;
}

main()
  .then((r) => console.log('returned:', r))
  .catch((e) => console.error('erroh!', e))
  .finally(() => console.log('application complete!'));

This prints the return value (5), and the application complete!.

(This "prelude" is here because you can't use await at the top level in node, which is mighty inconvenient here, but I'm sure they have their reasons.)

Let's add some "real" work to our example:

async function main() {
  const made = await new Promise((resolve, reject) => {
    // ... do some work ...
    resolve(2);
  });
  return made + 3;
}

This prints the same thing as the previous example, in a less direct way. await causes us to hand-off control from main to the Promise, and, when resolve is called, we "unblock" and resume running main.

But.. what happens if there's a bug in the do some work, and we don't call resolve?

async function main() {
  const made = await new Promise((resolve, reject) => {
    // (there's like four different bugs here)
    switch (Math.random(2)) {
      case 0:
        resolve(2);
        break;
      case 1:
        resolve(3);
        break;
    }
  });
  return made + 3;
}

% node a.js
%

...the app just vanishes. Our then(), catch(), and finally() are not run. The rest of main isn't run either. The exit status is SUCCESS.

As far as node is concerned, there is no code to run, and no IO is outstanding, so it's done. Bye!

Note that this can happen anywhere in your entire application. Deep within some library, on handling input, or only under certain load conditions.

Nobody would write code like that, you'd think. Unfortunately, much of the ecosystem forces you to write code like this; it's pretty much the only reason remaining you would write explicit promises. For example, dealing with subprocesses:

await new Promise((resolve, reject) => {
  child.once('exit', () => resolve());
  child.once('error', () => reject());
});

What happens if neither of these events fires? Your app is gone.

I hit this all the time. unzipper took down a service at work occasionally, probably this similar IO issue.

I hit the subprocess issue using the library in the simplest way I can imagine, reading the output of a command, then waiting for it to exit. Popular wrapper libraries have pretty much the same code.

The solution?

After consulting with a serious expert, we decided that the events probably don't fire (sometimes, under load) if they are not registered when the event happens. You might expect this, I didn't. You can resolve this by moving the promise creation above other code, and awaiting it later. This relies on the (surprising to me!) execution order of Promise constructor arguments.


You can also have great fun looking at execution order in your test case.

A row (in this picture, normally a column) is a job, which works from 1enter, to 8awaited.

This recording shows all of the workers completing the read in a row (6c), then interleaving of the function completing (7x, 8a), with new workers starting (1e, etc.). Note how some of the jobs 7x (exit) before they 6c (complete reading), which is probably our bug.


Commenting is disabled for this post.

Read more of Faux' blog