-
Notifications
You must be signed in to change notification settings - Fork 30.7k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
readline: add support for async iteration #23916
Conversation
550bb04
to
b631b4c
Compare
Should we benchmark this comparatively with the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work!
}; | ||
this.on('line', lineListener); | ||
this.on('close', closeListener); | ||
this[kLineObjectStream] = readable; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense expose this as a separate method? converting to a stream might be an issue for multiple people.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I consider this an implementation detail of @@asyncIterator
method. A major reason of why the performance of this method isn't up to par to 'line'
event, as you have noted in #23916 (comment), is because of the double buffering necessitated by the intermediate stream, so I'd rather not expose the stream at the moment.
I've compiled the branch and tested with 1 GB file (22 514 395 lines): Scripts and results:'use strict';
const fs = require('fs');
const readline = require('readline');
let counter = 0;
let dummy;
console.time('event');
const rl = readline.createInterface({
input: fs.createReadStream('big-file.txt', 'utf8'),
crlfDelay: Infinity,
});
rl.on('line', (line) => {
counter++;
dummy = line;
}).on('close', () => {
console.timeEnd('event');
console.log(`Lines: ${counter}, last line length: ${dummy.length}`);
});
'use strict';
const fs = require('fs');
const readline = require('readline');
(async function main() {
let counter = 0;
let dummy;
console.time('asyncIterator');
const rl = readline.createInterface({
input: fs.createReadStream('big-file.txt', 'utf8'),
crlfDelay: Infinity,
});
for await (const line of rl) {
counter++;
dummy = line;
}
console.timeEnd('asyncIterator');
console.log(`Lines: ${counter}, last line length: ${dummy.length}`);
})();
So event implementation currently 3 times as fast as async iterator implementation. Maybe we should warn about this. |
Can you check reading the same file with |
@mcollina Do you mean to compare async iterating over unsplit chunks vs async iterating over split lines? If so, I have ~ 4x factor: Scripts and results:const fs = require('fs');
(async function main() {
let counter = 0;
let dummy;
console.time('asyncIteratorChunks');
for await (const chunk of fs.createReadStream('big-file.txt', 'utf8')) {
counter++;
dummy = chunk;
}
console.timeEnd('asyncIteratorChunks');
console.log(`Chunks: ${counter}, last chunk length: ${dummy.length}`);
})();
'use strict';
const fs = require('fs');
const readline = require('readline');
(async function main() {
let counter = 0;
let dummy;
console.time('asyncIteratorLines');
const rl = readline.createInterface({
input: fs.createReadStream('big-file.txt', 'utf8'),
crlfDelay: Infinity,
});
for await (const line of rl) {
counter++;
dummy = line;
}
console.timeEnd('asyncIteratorLines');
console.log(`Lines: ${counter}, last line length: ${dummy.length}`);
})();
|
@mcollina Or do you mean to compare async iterating over unsplit chunks vs event implementation for unsplit chunks? If so, I have 1:1 factor, i.e. the same speed: Scripts and results:'use strict';
const fs = require('fs');
let counter = 0;
let dummy;
console.time('eventChunks');
const readable = fs.createReadStream('big-file.txt', 'utf8');
readable.on('data', (chunk) => {
counter++;
dummy = chunk;
}).on('close', () => {
console.timeEnd('eventChunks');
console.log(`Chunks: ${counter}, last chunk length: ${dummy.length}`);
});
'use strict';
const fs = require('fs');
(async function main() {
let counter = 0;
let dummy;
console.time('asyncIteratorChunks');
for await (const chunk of fs.createReadStream('big-file.txt', 'utf8')) {
counter++;
dummy = chunk;
}
console.timeEnd('asyncIteratorChunks');
console.log(`Chunks: ${counter}, last chunk length: ${dummy.length}`);
})();
|
#23901 has landed, it seems those commits can be excluded to simplify reviews. |
And beware #23929, we may have conflicts. |
@vsemozhetbyt thanks for those benchmarks, those are quite interesting. Specifically the fact that using stream iteration is now essentially on par with Related to |
Co-authored-by: Ivan Filenko <ivan.filenko@protonmail.com> Fixes: nodejs#18603 Refs: nodejs#18904
Documentation changes
b631b4c
to
f8ff7c7
Compare
@vsemozhetbyt @mcollina I've updated this PR to address the documentation comments. Please take a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Docs LGTM. Thank you! |
Maybe cc @nodejs/streams ? |
@devsnek maybe? Anyway this could land because it's older than a week. I'd recommend to wait for 2 days and then land if no one objects. |
Co-authored-by: Ivan Filenko <ivan.filenko@protonmail.com> Fixes: nodejs#18603 Refs: nodejs#18904 PR-URL: nodejs#23916 Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Gus Caplan <me@gus.host>
Notable Changes: * console,util: * `console` functions now handle symbols as defined in the spec. nodejs#23708 * The inspection `depth` default is now back at 2. nodejs#24326 * dgram,net: * Added ipv6Only option for `net` and `dgram`. nodejs#23798 * http: * Chosing between the http parser is now possible per runtime flag. nodejs#24739 * readline: * The `readline` module now supports async iterators. nodejs#23916 * repl: * The multiline history feature is removed. nodejs#24804 * tls: * Added min/max protocol version options. nodejs#24405 * The X.509 public key info now includes the RSA bit size and the elliptic curve. nodejs#24358 * url: * `pathToFileURL()` now supports LF, CR and TAB. nodejs#23720 * Windows: * Tools are not installed using Boxstarter anymore. nodejs#24677 * The install-tools scripts or now included in the dist. nodejs#24233 * Added new collaborator: * [antsmartian](https://github.com/antsmartian) - Anto Aravinth. nodejs#24655 PR-URL: nodejs#24854
PR-URL: nodejs#26472 Refs: nodejs#23916 Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de> Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
const TOTAL_LINES = 18; | ||
|
||
(async () => { | ||
const readable = new Readable({ read() {} }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this now (to solve another bug) I think this backpressure behaviour is confusing since other consumers can listen to line
on the stream and it's surprising we pause()
it for them.
Rewritten version of #18904, using more existing streams mechanisms.
Depends on #23901 for some of the edge case tests (relevant commits included within this PR).
Co-authored-by: Ivan Filenko ivan.filenko@protonmail.com
Fixes: #18603
Refs: #18904
/cc @mcollina @devsnek @prog1dev
Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passes