Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Provide a setting to run a spider against a specific archive #29

Open
leewesleyv opened this issue Jan 21, 2025 · 1 comment
Open

Provide a setting to run a spider against a specific archive #29

leewesleyv opened this issue Jan 21, 2025 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@leewesleyv
Copy link
Collaborator

Ideally we want to be able to optionally invoke the spider to run against the most recent archive by passing a CLI option or environment variable (picked up in settings).

@leewesleyv leewesleyv added the enhancement New feature or request label Jan 21, 2025
@wvengen
Copy link
Member

wvengen commented Jan 21, 2025

Since the spider knows where to locate object storage, it is relatively easy to figure this out. If a system talking to Scrapy needs to figure this it by itself, it needs to know container storage details.

As an addition to this feature, one could also perhaps provide a date/timestamp to locate the last archive before that (but that may depend on the configured storage path, so could be tricky).

@wvengen wvengen changed the title Provide a CLI option or setting to run a spider against a specific archive Provide a setting to run a spider against a specific archive Jan 21, 2025
@leewesleyv leewesleyv self-assigned this Jan 28, 2025
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants