We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
The request records in the CC-NEWS WARC files lack the HTTP protocol version:
GET /path
instead of
GET /path HTTP/1.1
This makes some WARC parsers fail to process the WARC files, see https://groups.google.com/d/msg/common-crawl/hsb90GHq6to/Lv-9-nHAAQAJ.
The text was updated successfully, but these errors were encountered:
Fix in Stormcrawler (apache/incubator-stormcrawler#775) deployed to production, WARC files now contain the HTTP version in the request message.
Sorry, something went wrong.
No branches or pull requests
The request records in the CC-NEWS WARC files lack the HTTP protocol version:
instead of
This makes some WARC parsers fail to process the WARC files, see https://groups.google.com/d/msg/common-crawl/hsb90GHq6to/Lv-9-nHAAQAJ.
The text was updated successfully, but these errors were encountered: