how does worker pick a site after crash? #231

mishranitin2003 · 2021-10-06T15:26:45Z

Scenario: I have warcprox and brozzler worker running on my local machine. While in the middle of archiving a website, if brozzler worker process is killed such as either using 'kill -9 <process_id>' or closing the console session.
After both warcprox and brozzler worker instances are restarted (on same ports as before), the site will not be picked for crawling. This is due to reason that db('Brozzler').table('sites').claimed property = true.

Query:

Is there a configuration property that can be set up so that the site can be picked by any single brozzler worker even if claimed=true?

nlevitt · 2021-10-07T00:24:02Z

If you wait an hour, it should start crawling again. See https://github.com/internetarchive/brozzler/blob/e23fa68d6/brozzler/frontier.py#L117. If you can't wait, you could set claimed=false in rethinkdb.

mishranitin2003 · 2021-10-10T20:26:45Z

Thanks @nlevitt for your quick reply. The problem is deciding when to make claimed=false. Is there any specific reason to choose 60 minutes or is just random?
Do you think it would be acceptable to make this 60 minutes configurable? If yes, please let me know and I can raise a PR for the same and if you need branch name to be against issue #231 or something else?

- Configurable claimed limit as it was hard coded to 60. The nodes in case of crash can come back in fairly quick time.

nlevitt · 2021-10-12T20:56:35Z

@mishranitin2003 It's not random. It has to be high enough that you will never have one worker claim a site when another is legitimately working on it. The value should not be configurable.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how does worker pick a site after crash? #231

how does worker pick a site after crash? #231

mishranitin2003 commented Oct 6, 2021

nlevitt commented Oct 7, 2021

mishranitin2003 commented Oct 10, 2021

nlevitt commented Oct 12, 2021

how does worker pick a site after crash? #231

how does worker pick a site after crash? #231

Comments

mishranitin2003 commented Oct 6, 2021

nlevitt commented Oct 7, 2021

mishranitin2003 commented Oct 10, 2021

nlevitt commented Oct 12, 2021