Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Handling errbacks #158

Merged
merged 6 commits into from
Feb 14, 2024
Merged

Handling errbacks #158

merged 6 commits into from
Feb 14, 2024

Conversation

pawelmhm
Copy link
Member

No description provided.

radostyle and others added 6 commits January 11, 2024 16:54
The errback should never have been defaulted to the 'parse' method
of the spider.  By doing this it invalidates what the scrapy docs
say.  Also, there is no documentation on the scrapy site that says
that exceptions get sent to the parse method.  The reason this was
found is because the error handling in the `process_spider_exception`
middleware was never getting called as the scrapy docs said it
should be.

The workaround to get it to work the way it did before with the
'parse' method is add `&errback=parse` in the request.
This will allow the ability to change the non-standard behavior of
sending exceptions to the `parse` method of the spider without
introducing a breaking change to scrapyrt.

It also introduces some documentation of the existing behavior.
Currently the application is not reporting to the user when the user provides an invalid errback or callback method.  The scheduling of the request and validation of the spider callback and errback happens in a different thread than the one which is handling the api request. So, we need a different mechanism to communicate with the api request thread than simply raising the exception.  We already do this for other errors and responses by adding properties to the CrawlManager object.  So it seems best to also communicate this exception to the api request by using a user_error property on the CrawlManager.  Then the exception can be raised in the context of the api request.
Co-authored-by: Adrián Chaves <adrian@chaves.io>
Co-authored-by: Adrián Chaves <adrian@chaves.io>
@pawelmhm pawelmhm merged commit f496cd3 into master Feb 14, 2024
8 checks passed
@pawelmhm pawelmhm deleted the handling-errbacks branch February 14, 2024 08:49
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants