-
-
Notifications
You must be signed in to change notification settings - Fork 31.6k
Retaining chunks while reading from fs stream appears to leak memory rapidly #21967
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Comments
It might be caused by the fact that every n-th allocated pooled buffer chunk is retained in this scenario, effectively retaning the whole underlying buffers, whie needing only a small chunk of it actually. |
@ChALkeR Yes, looking at Buffer pooling would be my first guess, too. I’m not sure how they would be retained indefinitely, though, because you do reset |
@addaleax Shouldn't be caused by js-side buffer pooling as the second file has chunks larger than poolsize. Also, I was yet unable to remove the dependency on Updated testcase which removes the second file: 'use strict';
const fs = require('fs');
fs.writeFileSync('test-small.bin', Buffer.alloc(100));
const maxLength = 4 * 1024 * 1024; // 4 MiB
let built = 0, buf = [], length = 0;
function tick() {
built++;
if (built % 1000 === 0) {
console.log(`RSS [${built}]: ${process.memoryUsage().rss / 1024 / 1024} MiB`);
}
const stream = fs.createReadStream('./test-small.bin');
stream.on('data', function (data) {
//data = Buffer.from(data); // WARNING: uncommenting this line fixes things somehow
buf.push(data)
length += data.length
if (length >= maxLength) {
buf = []
length = 0
}
});
stream.once('end', tock);
}
function tock() {
Buffer.alloc(65536);
setImmediate(tick);
}
tick(); |
@ChALkeR I think it’s not about In particular, this line seems to be not ideal: node/lib/internal/fs/streams.js Line 186 in f5a2167
This adjusts the pool offset by what we planned to read, which could be the whole pool size, not what we actually read. |
I think it’s a good sign that this seems to “solve” the problem: diff --git a/lib/internal/fs/streams.js b/lib/internal/fs/streams.js
index f527b1de4b84..f2cda91c6873 100644
--- a/lib/internal/fs/streams.js
+++ b/lib/internal/fs/streams.js
@@ -171,6 +171,8 @@ ReadStream.prototype._read = function(n) {
this.emit('error', er);
} else {
let b = null;
+ if (start + toRead === thisPool.used)
+ thisPool.used += bytesRead - toRead;
if (bytesRead > 0) {
this.bytesRead += bytesRead;
b = thisPool.slice(start, start + bytesRead); It only works if different streams don’t concurrently use the same pool, though, so I’m not sure it’s something we’d want to go with. |
More details about the usecase: I needed to concatenate a large number of files into a small number of files with compression. Something like this schematic code: cat a*.txt | lz4 > a.txt.lz4
cat b*.txt | lz4 > b.txt.lz4 In Gzemnid, I created several lz4 encoder streams (using node-lz4), piped those to output files, and upon encoutering a file to read — piped it into one of those lz4 streams (depending on the file). Now, the thing is that node-lz4 uses block compression and retains everything thrown into it until a sufficient input size is reached (which is 4 MiB by default) — and that data effectively comes from |
@addaleax What is also interesting is how exactly does 'use strict';
const maxLength = 4 * 1024 * 1024; // 4 MiB
let built = 0, buf = [], length = 0;
function tick() {
built++;
if (built % 1000 === 0) {
console.log(`RSS [${built}]: ${process.memoryUsage().rss / 1024 / 1024} MiB`);
}
let data = Buffer.allocUnsafe(65536).slice(0, 128).fill(0);
//data = Buffer.from(data); // uncommenting this line fixes things
buf.push(data)
length += data.length
if (length >= maxLength) {
buf = []
length = 0
}
Buffer.alloc(65536); // commenting this line also significantly reduces memory usage
setImmediate(tick);
}
tick(); High memory usage is expected though. Probably something related to memory management. |
@addaleax I guess that this is what's going on (re: how
|
Testcase for that: 'use strict';
const bs = 1024 * 1024; // 1 MiB
const retained = [];
let i = 0, flag = false;
function tick() {
i++;
if (i % 1000 === 0) {
console.log(`RSS [${i}]: ${process.memoryUsage().rss / 1024 / 1024} MiB`);
}
const buf = Buffer.allocUnsafe(bs);
retained.push(buf);
if (i === 20000) {
console.log('Clearing retained and enabling alloc');
retained.length = 0;
flag = true;
}
if (flag) Buffer.alloc(bs); // Even Buffer.alloc(bs - 10) seems to be fine here
if (i < 40000) setImmediate(tick);
}
tick(); |
The above ( // Compile with -O0
#include <iostream>
#include <unistd.h>
using namespace std;
int main() {
const int bs = 1024 * 1024; // 1 MiB
const int count = 1000;
void * x[count];
for (int i = 0; i < count; i++) {
free(calloc(bs * 2, 1)); // Commenting out this line reduces memory usage
x[i] = malloc(bs);
}
cout << "Allocated" << endl;
cout << "Sleeping..." << endl;
sleep(20);
cout << "Freeing" << endl;
for (int i = 0; i < count; i++) {
free(x[i]);
}
cout << "Hello" << endl;
return 0;
} |
@addaleax Btw, testcases from #21967 (comment), #21967 (comment) and #21967 (comment) (but not the initial one) have significantly less memory usage when Why don't we use jemalloc, btw? |
@ChALkeR The question has come up a few times, but I don’t know that we’ve ever done a full investigation into the benefits/downsides. I don’t think the examples here are something that we should base a decision on; that the OS may lazily populate memory pages is not really something for Node to care about, imo. |
This hack should be reverted once nodejs/node#21967 gets included into a Node.js v8.x release. Fixes: #18
Linux yoga 4.17.2-1-ARCH #1 SMP PREEMPT Sat Jun 16 11:08:59 UTC 2018 x86_64 GNU/Linux
While trying to resolve Gzemnid memory problems at nodejs/Gzemnid#18, I eventually reduced those to the following testcase. It seems to me that it's not node-lz4 fault, but something is wrong on the Node.js side.
Testcase:
Adding
data = Buffer.from(data);
fixes things somehow, problems start when the exact same chunks from the stream are retained for some time and some larger file reading goes on.gc-ing manually does not help — this looks like a memory leak.
All that memory is allocated through
node::ArrayBufferAllocator::Allocate
./cc @addaleax @nodejs/buffer
The text was updated successfully, but these errors were encountered: