Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Make principal web archive capture optional? #25

Open
matteocargnelutti opened this issue Oct 28, 2022 · 5 comments
Open

Make principal web archive capture optional? #25

matteocargnelutti opened this issue Oct 28, 2022 · 5 comments

Comments

@matteocargnelutti
Copy link
Collaborator

matteocargnelutti commented Oct 28, 2022

Should it be possible to skip the web capture step?

Potential use case: only capturing provenance summary, screenshot, pdf snapshot and video extraction on a given web page?

@matteocargnelutti matteocargnelutti changed the title Feature | Make principal web archive capture optional (?) Make principal web archive capture optional (?) Feb 19, 2023
@matteocargnelutti matteocargnelutti changed the title Make principal web archive capture optional (?) Make principal web archive capture optional? Mar 8, 2023
@edsu
Copy link

edsu commented Jan 25, 2024

Is the idea that it would cut down on the amount of storage?

@mdellabitta
Copy link

I can't address your question, but wanted to say: Nice to see you here, @edsu!

@matteocargnelutti
Copy link
Collaborator Author

Hi @edsu!

Is the idea that it would cut down on the amount of storage?

It is more to account for use cases that do not revolve around capturing HTTP exchanges in a WARC.
For example, some users might just want to make a PDF capture or screenshot of a web page using Scoop, and only care about that artifact.

@edsu
Copy link

edsu commented Jan 25, 2024

But don't you need to do the HTTP exchanges to generate the screenshot?

@matteocargnelutti
Copy link
Collaborator Author

@edsu Yes and no.

  • Yes: the HTTP exchanges will pass through the proxy as Scoop navigates to the page to take the screenshot
  • No: If I am only interested in the screenshot, I don't need to record these HTTP exchanges, and can also skip some intermediate steps, for example some of the browser behaviors.

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

3 participants