Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Retry mechanism for transient errors #49

Merged
merged 7 commits into from
Jul 3, 2024

Conversation

Mews
Copy link
Collaborator

@Mews Mews commented Jun 29, 2024

Closes #39

Changes

  • Added a is_transient_error function inside fetcher.py;
    • I hardcoded the list of transient errors as [408, 502, 503, 504] since they're the most common ones, but let me know if I should add/remove any;
  • Modified fetch_url to retry fetching the url when it encounters a transient error until retries reaches 0;
  • Added tests to test the retry feature both directly through the fetch_url function and in the crawl method;
  • Added a max_retry_attempts option to CrawlSettings;

Ps.: The wait times for each consecutive retry attempt are just 1, 2 ,3, ... seconds. Let me know if that's ok.

@Mews Mews requested a review from indrajithi June 29, 2024 19:53
@indrajithi indrajithi merged commit 8ed15c5 into DataCrawl-AI:master Jul 3, 2024
10 checks passed
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement a retry mechanism for transient errors
2 participants