Crawler tools queuing individually rather than as a whole #482

mpcen · 2022-08-03T23:56:39Z

There is evidence that suggests each tool is writing a message to the queue before aggregating the results of all tools before queuing. This puts a huge load on the DB causing outrageous costs and processing. There is only $30k available credits before all services are suspended.

Temporary mitigation for the crawler service:

Disable licensee and reuse tools at the expense of potentially more manual curations for missing licenses.

mpcen · 2022-09-27T01:54:10Z

Linked to #475 (comment)

Using StorageBackedQueue for local queue solved the problem of loosing local tool tasks (requests) during shutdown. This comes at a increased cost than in-memory queue, and is the same cost as the storage queue in production. Shutdown or restart are infrequent events. To reduce cost, use InMemoryCrawlerQueue as local queue instead for now. Task: clearlydefined#482

mpcen added High Priority Critical labels Aug 3, 2022

mpcen self-assigned this Aug 3, 2022

mpcen mentioned this issue Aug 3, 2022

temporarily turning off licensee and reuse tools until DB perf issues are resolved #481

Merged

mpcen closed this as completed Sep 27, 2022

qtomlinson mentioned this issue Nov 19, 2022

Use InMemoryCrawlerQueue as local queue #500

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crawler tools queuing individually rather than as a whole #482

Crawler tools queuing individually rather than as a whole #482

mpcen commented Aug 3, 2022

mpcen commented Sep 27, 2022

Crawler tools queuing individually rather than as a whole #482

Crawler tools queuing individually rather than as a whole #482

Comments

mpcen commented Aug 3, 2022

mpcen commented Sep 27, 2022