Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

WACZ reading / streaming #16

Open
matteocargnelutti opened this issue Mar 6, 2023 · 2 comments
Open

WACZ reading / streaming #16

matteocargnelutti opened this issue Mar 6, 2023 · 2 comments

Comments

@matteocargnelutti
Copy link
Collaborator

(Suggested by @ikreymer)

Add a command and associated API for reading and streaming the contents WACZ files, either locally or remotely.

See: https://www.npmjs.com/package/unzipit

@matteocargnelutti
Copy link
Collaborator Author

(Suggested by @rebeccacremona)

This feature should allow for specialized / simplified extractions such as:

  • Extracting the CDX of a WACZ
  • Extracting the datapackage signature
  • Looking up for a specific record in the underlying WARC(s)
  • etc...

@ikreymer
Copy link
Collaborator

ikreymer commented May 4, 2023

Previously also prototyped a very simple zip (not wacz) loader that can stream a file from zip, mostly using it myself, but recently put it up here: https://github.com/ikreymer/loadzip/blob/main/index.js

Agreed WACZ-specific semantics would be very useful to have as well!

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

2 participants