Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

createWriteStream({resumable:false}) causes error to be swallowed and program to hang #312

Closed
giltayar opened this issue Jul 30, 2018 · 10 comments
Assignees
Labels
api: storage Issues related to the googleapis/nodejs-storage API. 🚨 This issue needs some love. status: blocked Resolving the issue is dependent on other work. triage me I really want to be triaged. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@giltayar
Copy link

Environment details

  • OS: MacOS and Linux
  • Node.js version: 10.6.0
  • npm version: 6.1.0
  • @google-cloud/storage version: 1.7.0

Steps to reproduce

  1. Run the code below (before, change projectId and bucket in the code, and run npm install @google-cloud/storage @rauschma/stringio)
  2. It should fail with error code 429 (rateLimitExceeded), but instead the code never finishes. This is the problem. The program should fail, because we're putting the same content in the same path too many times. (If you always put the text in random paths then everything works without a 429.)
  3. Comment out resumable: false and run it again
  4. It will fail with error code 429, as expected.

Code:

'use strict'
const Storage = require('@google-cloud/storage')
const {StringStream} = require('@rauschma/stringio')

const projectId = 'rendering-grid'
const bucket = 'test-storage-problem-can-delete'

async function main() {
  const storage = new Storage({
    projectId,
  })

  const put = async () => {
    await new Promise((resolve, reject) => {
      const writeStream = storage
        .bucket(bucket)
        .file('foo/bar')
        .createWriteStream({
          resumable: false,
          metadata: {
            contentType: 'text/plain',
          },
        })

      writeStream.on('finish', resolve).on('error', reject)

      const readStream = new StringStream('some debugging text')

      readStream.on('error', reject)
      readStream.pipe(writeStream)
    })
  }

  for (let i = 0; i < 10; ++i) {
    console.log('#### Run #', i + 1)
    await Promise.all([...Array(10)].map(() => put().then(() => process.stdout.write('.'))))
    console.log('')
  }
}

main().catch(console.error)

So {resumable: false} is causing the program to hang, I'm guessing because it's not reporting the error on the stream.

@stephenplusplus stephenplusplus added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p2 Moderately-important priority. Fix may not be included in next release. labels Jul 30, 2018
@andreban
Copy link

andreban commented Aug 6, 2018

I ran into the same issue and spent quite a while trying to figure out what was happening.

In my case, the contentType was being set to an invalid value, which was causing the error. The data is being fetched from a server and uploaded to datastore and the origin server is setting an invalid Content-Type.

My workaround was simply removing "resumable: false" and double checking if the Content-Type was right.

Is it possible to raise the priority for this? IMO it's likely that others may run into this issue and see their servers just hang, and it's quite tricky to find the source of the issue.

Here's a code to replicated the error with invalid Content-Type.

const fs = require('fs');
const config = require('./config/config');
const cloudStorage = require('@google-cloud/storage');

const CLOUD_BUCKET = config.get('CLOUD_BUCKET');
const storage = cloudStorage({
  projectId: config.get('GCLOUD_PROJECT')
});

const bucket = storage.bucket(CLOUD_BUCKET);

const file = bucket.file('test.png');
const readStream = fs.createReadStream('./test-image.png');

const metadata = {
  contentType: 'PNG' // Invalid Content-Type
};

const writeStream = file.createWriteStream({
  metadata: metadata,
  resumable: false
});

writeStream.on('error', err => {
  console.log(err.message);
});

writeStream.on('finish', () => {
  console.log('finish');
});

readStream.pipe(writeStream);

@andreban
Copy link

andreban commented Aug 6, 2018

I took a look at file.js and I think there's a possible fix.

I think the issue seems to be tat that the stream to which the error is being emitted to is different from the one exposed to the user (eg: where stream.on('error', ...) is being called.

const fileWriteStream = duplexify();

But the stream that is being returned to the user is

const stream = streamEvents(
      pumpify([
        gzip ? zlib.createGzip() : through(),
        validateStream,
        fileWriteStream,
      ])
    );

I noticed that a little below the code, event emitted to fileWriteStream and being replicated to stream. Example:

fileWriteStream.on('response', stream.emit.bind(stream, 'response'));

I tested out doing the same thing for the error event, and it helps in my case (the code posted previously works):

fileWriteStream.on('error', stream.emit.bind(stream, 'error'));

Are there other clean-ups that should be done instead of simply forwarding the event?

@fhinkel
Copy link
Contributor

fhinkel commented Sep 6, 2018

ping @stephenplusplus

@stephenplusplus stephenplusplus self-assigned this Sep 29, 2018
@stephenplusplus
Copy link
Contributor

@andreban thank you for that investigation. It didn't resolve @giltayar's issue for me, unfortunately, and it seems unnecessary as pumpify will destroy the whole "pumpified" stream if any of them emit an error.

What I see when I run the original script is:

$ node .
#### Run # 1
..........
#### Run # 2
{ Error: ESOCKETTIMEDOUT
    at ClientRequest.<anonymous> (/Users/stephen/dev/play/gissue-312/node_modules/request/request.js:816:19)
    at Object.onceWrapper (events.js:273:13)
    at ClientRequest.emit (events.js:182:13)
    at TLSSocket.emitRequestTimeout (_http_client.js:661:40)
    at Object.onceWrapper (events.js:273:13)
    at TLSSocket.emit (events.js:182:13)
    at TLSSocket.Socket._onTimeout (net.js:449:8)
    at ontimeout (timers.js:425:11)
    at tryOnTimeout (timers.js:289:5)
    at listOnTimeout (timers.js:252:5) code: 'ESOCKETTIMEDOUT', connect: false }

A long time goes by before the timeout (2 or 3 minutes), but the error that is coming from the request library is being reported.

So, I'm not sure why there is no error, but also no success. I'll keep looking. I just thought I'd share where I'm at so far.

@zbjornson
Copy link
Contributor

Think I found it: When this library tries to retry the request because of a 429 or any other error, the stream is already consumed and there's no data available to write to the socket. Then the socket times out expecting data.

Good job finding a way to consistently reproduce the error. I think this is the same issue plaguing everyone in #27 (which we hit on a daily basis).

For the record, the new error since the "request" module was replaced by "teeny-request":

FetchError: network timeout at: https://www.googleapis.com/upload/storage/v1/b/zb-dev/o?uploadType=multipart&name=foo%2Fbar
    at Timeout.<anonymous> (/home/zbjornson/nodejs-storage-master/node_modules/node-fetch/lib/index.js:1338:13)
    at ontimeout (timers.js:498:11)
    at tryOnTimeout (timers.js:323:5)
    at Timer.listOnTimeout (timers.js:290:5)
  message: 'network timeout at: https://www.googleapis.com/upload/storage/v1/b/zb-dev/o?uploadType=multipart&name=foo%2Fbar',
  type: 'request-timeout'

The issue can be reduced to the following, bypassing a lot of the storage module's stream complexity:

const {Storage} = require("."); // nodejs-storage
const {StringStream} = require('@rauschma/stringio')
const client = new Storage({projectId: "..."});
for (let i = 0; i < 30; i++) {
        console.log(">>");
        const body = new StringStream("abc");
        client.request({
                uri: "https://www.googleapis.com/upload/storage/v1/b/zb-dev/o",
                method: "POST",
                qs: {
                        uploadType: "multipart",
                        name: "foo/bar"
                },
                multipart: [
                        {"Content-Type": "application/json", body: '{"contentType":"text/plain"}'},
                        {"Content-Type": "text/plain", body}
                ]
        }, (err, body, res) => {
                  console.log(err, body && body.id, res && res.statusCode)
        });
}
// ~10x 200 responses, then times out

Sequence of events:

  1. Full requests for all 30 attempts are sent. Body looks like this (note the STREAM BODY comment):
// socket#        timestamp characters
28 2018-10-12T10:35:05.697Z 'POST /upload/storage/v1/b/zb-dev/o?uploadType=multipart&name=foo%2Fbar HTTP/1.1\r\nUser-Agent: gcloud-node-storage/2.1.0\r\nx-goog-api-client: gl-node/8.12.0 gccl/2.1.0\r\nAuthorization: Bearer xxxxxxxxxxxxxx\r\nContent-Type: multipart/related; boundary=1737f7e9-422e-47be-b872-178edd262fe5\r\nAccept: */*\r\nAccept-Encoding: gzip,deflate\r\nConnection: close\r\nHost: www.googleapis.com\r\nTransfer-Encoding: chunked\r\n\r\n4a'
28 2018-10-12T10:35:05.697Z '\r\n'
28 2018-10-12T10:35:05.697Z '--1737f7e9-422e-47be-b872-178edd262fe5\r\nContent-Type: application/json\r\n\r\n'
28 2018-10-12T10:35:05.697Z '\r\n'
28 2018-10-12T10:35:05.697Z '1c'
28 2018-10-12T10:35:05.698Z '\r\n'
28 2018-10-12T10:35:05.698Z '{"contentType":"text/plain"}'
28 2018-10-12T10:35:05.698Z '\r\n'
28 2018-10-12T10:35:05.698Z '2'
28 2018-10-12T10:35:05.698Z '\r\n'
28 2018-10-12T10:35:05.698Z '\r\n'
28 2018-10-12T10:35:05.698Z '\r\n'
28 2018-10-12T10:35:05.698Z '44'
28 2018-10-12T10:35:05.698Z '\r\n'
28 2018-10-12T10:35:05.698Z '--1737f7e9-422e-47be-b872-178edd262fe5\r\nContent-Type: text/plain\r\n\r\n'
28 2018-10-12T10:35:05.698Z '\r\n'
28 2018-10-12T10:35:05.698Z '3'
28 2018-10-12T10:35:05.698Z '\r\n'
28 2018-10-12T10:35:05.698Z 'abc' # <--- STREAM BODY
28 2018-10-12T10:35:05.698Z '\r\n'
28 2018-10-12T10:35:05.698Z '2'
28 2018-10-12T10:35:05.698Z '\r\n'
28 2018-10-12T10:35:05.698Z '\r\n'
28 2018-10-12T10:35:05.698Z '\r\n'
28 2018-10-12T10:35:05.698Z '28'
28 2018-10-12T10:35:05.699Z '\r\n'
28 2018-10-12T10:35:05.699Z '--1737f7e9-422e-47be-b872-178edd262fe5--'
28 2018-10-12T10:35:05.699Z '\r\n'
28 2018-10-12T10:35:05.699Z '0\r\n\r\n'
# ~150 ms pass
28 2018-10-12T10:35:05.844Z 'socket.close' 779 785 # 779 bytes read, 785 written
  1. About 10 succeed.
  2. About 20 receive a 429 response within about 100 ms.
  3. Library retries. This time, the multipart body looks like this:
30 2018-10-12T10:35:07.927Z 'POST /upload/storage/v1/b/zb-dev/o?uploadType=multipart&name=foo%2Fbar HTTP/1.1\r\nUser-Agent: gcloud-node-storage/2.1.0\r\nx-goog-api-client: gl-node/8.12.0 gccl/2.1.0\r\nAuthorization: Bearer xxxxxxxxxxxxxxxxxxxxxxxxxx\r\nContent-Type: multipart/related; boundary=5eeaec7c-dc01-4125-9ee8-8d767010571a\r\nAccept: */*\r\nAccept-Encoding: gzip,deflate\r\nConnection: close\r\nHost: www.googleapis.com\r\nTransfer-Encoding: chunked\r\n\r\n4a'
30 2018-10-12T10:35:07.927Z '\r\n'
30 2018-10-12T10:35:07.927Z '--5eeaec7c-dc01-4125-9ee8-8d767010571a\r\nContent-Type: application/json\r\n\r\n'
30 2018-10-12T10:35:07.927Z '\r\n'
30 2018-10-12T10:35:07.927Z '1c'
30 2018-10-12T10:35:07.927Z '\r\n'
30 2018-10-12T10:35:07.927Z '{"contentType":"text/plain"}'
30 2018-10-12T10:35:07.927Z '\r\n'
30 2018-10-12T10:35:07.927Z '2'
30 2018-10-12T10:35:07.927Z '\r\n'
30 2018-10-12T10:35:07.927Z '\r\n'
30 2018-10-12T10:35:07.927Z '\r\n'
30 2018-10-12T10:35:07.928Z '44'
30 2018-10-12T10:35:07.928Z '\r\n'
30 2018-10-12T10:35:07.928Z '--5eeaec7c-dc01-4125-9ee8-8d767010571a\r\nContent-Type: text/plain\r\n\r\n'
30 2018-10-12T10:35:07.928Z '\r\n'
# one minute passes
30 2018-10-12T10:36:07.931Z 'socket.close' 0 719 # 0 bytes read, 719 bytes written

These retries appear to happen twice after 60-second timeouts, before finally timing out entirely.


This seems like a fundamental flaw with using streams as a datasource when the sink might require a retry, and I have no great ideas for how to fix it. Brainstorming:

  • Accept that streams can't be retried, and remove the retry attempts for non-resumable uploads.

  • Use the resumable upload mechanism and send chunks that reasonably fit in memory (ideally user-configurable) so that the chunk can be retried if needed. (For reference, Node.js's default highWaterMark is 16,384 B.) Even with keepalive, the overhead of this might be insurmountable. A 1 GB file in 16k chunks = 61k requests.

@stephenplusplus
Copy link
Contributor

stephenplusplus commented Oct 12, 2018

Awesome, thank you for this research! 🙏

Accept that streams can't be retried, and remove the retry attempts for non-resumable uploads.

Yes, sadly. Here's a related issue in the retry-request library about retrying POST requests: googleapis/retry-request#3 (comment).

I'm working on making this a single, un-retried request. I'll link you when I have a PR for review 👍

@stephenplusplus
Copy link
Contributor

PR sent! googleapis/nodejs-common#268

@stephenplusplus stephenplusplus added the status: blocked Resolving the issue is dependent on other work. label Oct 15, 2018
@zbjornson
Copy link
Contributor

Looks good to me!

Since that changes the default though, does the number of retries need to be changed anywhere else?

@ghost ghost removed the priority: p2 Moderately-important priority. Fix may not be included in next release. label Oct 16, 2018
@zbjornson
Copy link
Contributor

@JustinBeckwith @stephenplusplus just want to make sure my question above doesn't get lost -- it looks like that PR changed the default number of retries, so do non-zero retry values need to be set in this and other modules (for non-streaming requests obviously)?

@stephenplusplus
Copy link
Contributor

The PR didn't change the default for all methods, only util.makeWritableStream, where we set the number of retries to 0, since retrying a POST is never possible.

@google-cloud-label-sync google-cloud-label-sync bot added the api: storage Issues related to the googleapis/nodejs-storage API. label Jan 31, 2020
@yoshi-automation yoshi-automation added triage me I really want to be triaged. 🚨 This issue needs some love. labels Apr 6, 2020
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
api: storage Issues related to the googleapis/nodejs-storage API. 🚨 This issue needs some love. status: blocked Resolving the issue is dependent on other work. triage me I really want to be triaged. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

6 participants