Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Strategy for dealing with Stack Overflows #696

Closed
aearly opened this issue Jan 11, 2015 · 13 comments
Closed

Strategy for dealing with Stack Overflows #696

aearly opened this issue Jan 11, 2015 · 13 comments

Comments

@aearly
Copy link
Collaborator

aearly commented Jan 11, 2015

Stack overflow issues have come up frequently with async. Most of the time, they come from using synchronous iteration functions with large inputs, e.g. async.eachSeries(Array(1000000), function () { callback(); }, done). (#66, #75, #110, #117, #133, #296, #413, #510, #573, #588, #590, #604, #622) Some of those issues aren't caused by large arrays exactly, but most of the time they boil down to using a sync iteration function.

A common way to fix the problem is to use an async.nextTick() internally to re-set the call-stack for the next step of iteration. This masks the problem, and slows performance. In node and later IE's, you can use setImmediate or process.nexTick which is very fast, but in other browsers it is implemented as setTimeout(fn, 0), which actually ends up being about setTimeout(fn, 4). It adds unnecessary delay to your processing, if your iteration functions are actually asynchronous. This setImmediate-type guarding was added to waterfall and it appears to cause a pretty big performance hit: #40, #152, #513, #621

I feel like we need to come up with a comprehensive and consistent strategy for dealing with these competing issues.

Option 1: Don't guard against stack overflows, leave it to the user

I propose removing all async.nextTicks and setImmediates from async library functions, to remove the performance hit. This would be a pretty simple change. We can still offer the async.nextTick and async.setImmediate functions to help people fix their sync functions, when they do need to use them in a waterfall or something similar.

We would need to document what we actually mean by synchronous functions , based on the confusion in #629. A RangeError is a pretty good indicator of using a sync function.

This would be a breaking change, a 1.0-type feature. We could also provide a wrapper function that will make sync functions work, to ease migration. stackSafe from #516 is an interesting idea.

Option 2: Guard against stack overflows, but use a better setImmediate

There is a pretty comprehensive polyfill for setImmediate that handles a variety of browsers and JS execution environments: YuzuJS/setImmediate. It uses the native setImmediate in node and later IE's, postMessage tricks in modern browsers, and script.onreadystatechange tricks in older IE's. It gives sub-millisecond deferrals for 99%+ of JS execution environments.

If we were to use this for the expressed purpose of preventing stack overflows, we should use it everywhere. There should be tests for every async library function with huge arrays and sync functions. This would be a bit of work.

It would require a build-step (e.g. browserify --standalone) for the standalone distribution of async. Async would now have an npm dependency, The other option is to just copy the setImmediate code in, which is compatible with the MIT license I believe.

Option 3: Have a "dev-mode" that can detect sync iterators

Based on an environment variable or async._devMode flag, we could conditionally add some guards to iteration functions to see if they callback on the same event loop tick. Stack overflows would still happen, but you would get warnings that would make the source of the error easier to find. This would be tricky to implement, but it is a compromise between Options 1 and 2.

Implementing it without causing a performance hit will be difficult, and would probably require a build that can strip dead code. I'm reminded of how envify and Uglify can strip out if (process.env.NODE_ENV !== "production") { ... } blocks when browserifying, but I'm not sure if having both async.js and async.dev.js is desirable.

This is also a breaking, 1.0-type feature.

Option 4: Do nothing

Just leave things as-is. Some functions guard against stack overflows, others don't. We have a mis-match of philosophies. waterfall is slower than is has to be, eachSeries is fast, but will RangeError with a large array and a sync iteration function. We will continue to receive bug reports about stack overflows and poor performance.


So -- thoughts anyone? (@caolan @beaugunderson ) I am in favor of Option 1, but I am also biased because I also have experience with diagnosing the errors caused by sync iterators. 😃

@caolan
Copy link
Owner

caolan commented Jan 12, 2015

Option 1 for sure. Some guards got added at one point but I thought they'd all been removed by now.

@aearly
Copy link
Collaborator Author

aearly commented Jan 14, 2015

Ok, I am working on this. It's not a simple as removing the setImmediates -- some methods relying on it to make things easier.

@SethHamilton
Copy link

I use nexTick, then setImmediate later on with all my "next" callbacks when using async because I ran into issues a very long time ago. Does Async now have these built in? I'm assuming there is a performance hit if the library does it and I do it as well. If they are leaving, then I have to change nothing, but if they are staying (if they exist) then I should probably remove mine.

As for my suggestion it would making it a switch. I've always hated having to call my callbacks wrapped up, it makes for ugly code. If it were a switch to turn on hidden setImmediate calls that would be slick.

@aearly
Copy link
Collaborator Author

aearly commented Feb 22, 2015

I've added an ensureAsync function wrapper to my acomb helper library that will use a setImmediate call with your callback if (and only if) it needs to:

https://github.com/aearly/acomb#ensureAsync

@matbee-eth
Copy link

Something like this, http://dbaron.org/log/20100309-faster-timeouts, would be a good strategy for browsers.

@aearly
Copy link
Collaborator Author

aearly commented Apr 18, 2015

I want to incorporate some postMessage tricks like that, but getting a cross browser/cross platform implementation that is fast in the most places is quite complicated: https://github.com/YuzuJS/setImmediate/blob/master/setImmediate.js

Also, even the fastest implementation (process.nextTick) is still slower than a simple function call, so it's not something we'd want to do by default everywhere.

@matbee-eth
Copy link

Implementing a postMessage/nextTick function based on a variable MAX_STACK_SIZE option would be nice.

@charlierudolph
Copy link
Contributor

I'd suggest not guarding against stack overflows by default.

There could be an opt-in option that can either be turned on per method call or with an overall switch that wraps every async function in ensureAsync.

async.each(arr, iterator[, options][, callback])

if options.safe (or something else) is true, then wrap iterator in ensureAsync.

@megawac
Copy link
Collaborator

megawac commented Aug 17, 2015

Agreed, it'd be nice to be able to detect the SO and provide a console warn to use a sync version (window.onerror&process.on('uncaughtException') hack :P)

@aearly
Copy link
Collaborator Author

aearly commented Mar 9, 2016

With each, auto, and waterfall now without a deferral, I think this issue is safe to close. The only functions that have deferrals are ensureAsync (its core functionality), memoize, forever, and queue (which need it to work properly).

@aearly aearly closed this as completed Mar 9, 2016
@sebamarynissen
Copy link

I ran into this issue as well recently. I am trying to parse a certain file format using node.js streams. As a consequence, I'm unable to parse the entire file at once, because a node.js stream will only give me buffers of which I don't know the size beforehand. As a result, I also don't know if sufficient bytes are present to do the parsing of a specific entry in the file. This is why I used the async module which uses callbacks that get called once sufficient bytes become available. If sufficient bytes are already available in the current buffer, then the callback is fired synchronously.

This works fine and provides a neat way of abstracting away whether I have to wait for another buffer chunk to be received or not. However, I if receive very large buffer chunks at once from the stream; it is possible that certain loops that I created using the async module are completely synchronous. Consider a trivial example where I have the entire file to be parsed as 1 buffer chunk, in that case all my async-stuff will be synchronous under the hood because all bytes are available - which is what I want because I would like to use my parser for synchronously parsing a file as well. As a result, I sometimes receive stack overflow errors.

I managed however to solve the issue by implementing my own "whilst" method. The philosophy is that I detect whether a callback is fired synchronously or asynchronously. If the callback is fired synchronously, I use a while loop which doesn't increase the call stack. If the callback is fired asynchronously, I use a recursive function call as normal. Stack overflow errors aren't to be expected here because the stack is cleared between two process ticks. This looks as follows:

const whilst = function(test, it, cb) {
  
  let ctx = this, main;
  let strategy = main = function() {
    return test.call(ctx);
  };
  let quit = function() {return false;};
  let results = [];
  const go = function() {

    // Loop synchronously as much as we can in a while loop instead of 
    // going deeper down the call stack.
    while (strategy()) {
      let sync = true;
      let pre = false;
      it.call(ctx, function(err, ...rest) {

        // This is where the magic happens: in case we're still 
        // synchronous, the "sync" variable will _still_ be set to 
        // true. Don't do anything now, the loop will continue out of 
        // itself by calling the test again. However, if the callback 
        // was fired asynchronously, the "sync" variable will already 
        // be false. In that case we need to re-enter the
        // while(strategy()) loop with the test function being called.
        results = rest;
        if (sync) {
          pre = true;
        }
        else {
          strategy = main;
          go();
        }

      });

      // In case we're async, "pre" will stil be false. This means that 
      // the loop needs to be quit, which can be done by changing 
      // strategies to quit.
      if (!pre) {
        strategy = quit;
      }
      sync = false;

    }

    // If we've reached this point, but the strategy was not set to 
    // "quit", this means that the synchronous test simply returned false. 
    // Call the final callback to exit the loop.
    if (strategy === main) {
      cb(null, ...results);
    }

  };
  go();

};

I can imagine that this concept would be possible to apply to other methods vulnerable to stack overflows as well. As such I think that in theory it is possible to adapt the entire async library to be able to handle synchronous callbacks without stack overflows. Whether this is desirable in a library whose purpose is for working with "true" asynchronous functions is an open question.

@aearly
Copy link
Collaborator Author

aearly commented Aug 16, 2018

What you described is very similar to ensureAsync -- a wrapper for functions that always ensures the callback is called on the next tick.

@sebamarynissen
Copy link

sebamarynissen commented Aug 16, 2018

@aearly, perhaps my explanation was not entirely clear, but what I tried to do was the exact opposite of ensureAsync: if all callbacks are called synchronously, then the entire whilst loop executes synchronously. This means that

let i = 0;
let order = [];
whilst(function() {
  return i < 1e6;
}, function(cb) {
  i++;
  cb(null);
}, function(err) {
  order.push(1);
});
order.push(2);

// Right now, order === [1, 2];, because the callbacks are called synchronously.

This is already the default behavior for the whilst function in async, but my approach above avoids stackoverflow errors for synchronous while loops with more iterations than the JS call stack can handle.

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

7 participants