Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Crawler tools queuing individually rather than as a whole #482

Closed
mpcen opened this issue Aug 3, 2022 · 1 comment
Closed

Crawler tools queuing individually rather than as a whole #482

mpcen opened this issue Aug 3, 2022 · 1 comment

Comments

@mpcen
Copy link
Member

mpcen commented Aug 3, 2022

There is evidence that suggests each tool is writing a message to the queue before aggregating the results of all tools before queuing. This puts a huge load on the DB causing outrageous costs and processing. There is only $30k available credits before all services are suspended.

Temporary mitigation for the crawler service:

  • Disable licensee and reuse tools at the expense of potentially more manual curations for missing licenses.
@mpcen
Copy link
Member Author

mpcen commented Sep 27, 2022

Linked to #475 (comment)

@mpcen mpcen closed this as completed Sep 27, 2022
qtomlinson added a commit to qtomlinson/crawler that referenced this issue Nov 19, 2022
Using StorageBackedQueue for local queue solved the problem of loosing
local tool tasks (requests) during shutdown. This comes at a increased
cost than in-memory queue, and is the same cost as the storage queue in
production.

Shutdown or restart are infrequent events. To reduce cost, use
InMemoryCrawlerQueue as local queue instead for now.

Task: clearlydefined#482
# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

1 participant