Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Reading an append blob to a string takes long and then returns HTTP 412 #51

Closed
petmat opened this issue Jan 4, 2019 · 2 comments
Closed
Assignees
Labels
question Further information is requested

Comments

@petmat
Copy link

petmat commented Jan 4, 2019

Which service(blob, file, queue, table) does this issue concern?

blob

Which version of the SDK was used?

@azure/storage-blob@10.3.0

What's the Node.js/Browser version?

Node.js v10.14.1

What problem was encountered?

Following the example here https://github.com/Azure/azure-storage-js/blob/master/blob/samples/basic.sample.js I was trying to read an append blob from my storage account to a string. It resulted in the streamToString function taking really long and then finally giving an HTTP 412 error with errorCode of undefined. I suspect the error might have something to do with me trying to read an append blob that is constantly getting more lines in it. It is a log file and I would just like to be able to read the current snapshot of it. I could not find any examples dealing with a scenario like mine. Any help would be appreciated!

The detailed error is down here:

{ Error: Unexpected status code: 412
    at new RestError (C:\projects\xxx\RequestLogViewer\node_modules\@azure\ms-rest-js\dist\msRest.node.js:1397:28)
    at C:\projects\xxx\RequestLogViewer\node_modules\@azure\ms-rest-js\dist\msRest.node.js:1849:37
    at process._tickCallback (internal/process/next_tick.js:68:7)
  code: undefined,
  statusCode: 412,
  request:
   WebResource {
     streamResponseBody: true,
     url:
      'https://xxxstor.blob.core.windows.net/request-logs/2019-01-04.txt',
     method: 'GET',
     headers: HttpHeaders { _headersMap: [Object] },
     body: undefined,
     query: undefined,
     formData: undefined,
     withCredentials: false,
     abortSignal:
      a {
        _aborted: false,
        children: [],
        abortEventListeners: [Array],
        parent: undefined,
        key: undefined,
        value: undefined },
     timeout: 0,
     onUploadProgress: undefined,
     onDownloadProgress: undefined,
     operationSpec:
      { httpMethod: 'GET',
        path: '{containerName}/{blob}',
        urlParameters: [Array],
        queryParameters: [Array],
        headerParameters: [Array],
        responses: [Object],
        isXML: true,
        serializer: [Serializer] } },
  response:
   { body: undefined,
     headers: HttpHeaders { _headersMap: [Object] },
     status: 412 },
  body: undefined }

Steps to reproduce the issue?

Here is my code:

const {
  Aborter,
  BlobURL,
  ContainerURL,
  SharedKeyCredential,
  ServiceURL,
  StorageURL,
} = require('@azure/storage-blob');
const format = require('date-fns/format');

async function streamToString(readableStream) {
  return new Promise((resolve, reject) => {
    const chunks = [];
    readableStream.on('data', (data) => {
      chunks.push(data.toString());
    });
    readableStream.on('end', () => {
      resolve(chunks.join(''));
    });
    readableStream.on('error', reject);
  });
}

async function run() {
  const accountName = 'xxxstor';
  const accountKey = 'omitted';
  const credential = new SharedKeyCredential(accountName, accountKey);
  const pipeline = StorageURL.newPipeline(credential);
  const serviceURL = new ServiceURL(
    `https://${accountName}.blob.core.windows.net`,
    pipeline
  );
  const containerName = 'request-logs';
  const containerURL = ContainerURL.fromServiceURL(serviceURL, containerName);
  const blobName = `${format(new Date(), 'YYYY-MM-DD[.txt]')}`;
  const blobURL = BlobURL.fromContainerURL(containerURL, blobName);
  console.log('Downloading blob...');
  const response = await blobURL.download(Aborter.none, 0);
  console.log('Reading response to string...');
  const body = await streamToString(response.);
  console.log(body.length);
}

run().catch((err) => {
  console.error(err);
});

Have you found a mitigation/solution?

no

@XiaoningLiu
Copy link
Member

@petmat You are right about "trying to read an append blob that is constantly getting more lines in it."

blobURL.download() will try to download a blob with a HTTP Get request into a stream. When stream unexpected ends due to such as network broken, a retry will resume the stream read from the broken point with a new HTTP Get request.

The second HTTP request will use conditional header IfMatch with the blob's ETag returned in first request to make sure the blob doesn't change when the 2nd retry happens. Otherwise, a 412 conditional header doesn't match error will be returned. This strict strategy is used to avoid data integrity issues, such as the blob maybe totally over written by someone others. However, this strategy seems avoiding you from reading a constantly updated log file when a retry happens.

While I don't think it's bug, but we need to make this scenario work for you. There are 2 solutions, please have a try:

1> snapshot the append blob first, and read from the snapshot blob
2> set maxRetryRequests to 0 in blobURL.download() options. And download a small range (for example 4MB) every time. This means no conditional retry will happen, so no 412 error either. But the returned stream may not complete when there are network issues. Set a small range will help avoid it. Check the stream length.

@XiaoningLiu XiaoningLiu self-assigned this Jan 7, 2019
@XiaoningLiu XiaoningLiu added the question Further information is requested label Jan 7, 2019
@petmat
Copy link
Author

petmat commented Jan 8, 2019

I went with solution number one and I got the snapshot of the blob downloaded with the following code:

// ...
const blobURL = BlobURL.fromContainerURL(containerURL, blobName);
console.log('Downloading blob...');
const snapshotResponse = await blobURL.createSnapshot(Aborter.none);
const snapshotURL = blobURL.withSnapshot(snapshotResponse.snapshot);
const response = await snapshotURL.download(Aborter.none, 0);
console.log('Reading response to string...', snapshotURL.blobContext.length);
const body = await streamToString(response.readableStreamBody);
// ...

Downloading the blob did take some time though. (couple of minutes) But then again the size of the blob is ~140 MB. I didn't explore the solution number two because it seemed a bit contrived. But I'm happy with this 👍 Next I'm just going to implement a strategy to decide how often a new snapshot will be created and possibly removing old ones.

Thank you!

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants