Bump the indexing timeout value #68

higs4281 · 2024-02-12T15:23:48Z

Indexing on EXT Jenkins has become troublesome, possibly because of the increased load on the server from more aggressive security scanning and heavy morning cron jobs.

Two types of errors have interrupted indexing:

Disconnects with S3
Timeouts during OpenSearch indexing

These appear to be specific to EXT Jenkins, because our DEV Jenkins instance (zusa) runs the same pipeline code against the same data, at the same time of the morning, and rarely fails to finish.

The S3 interruptions are not too disruptive, because the downloads that succeed won't have to be run again on a retry, and little time is lost.

The timeout error, however, is painful when indexing bombs late in the indexing run. In the last week, a timeout stopped indexing after 4 million complaints had been indexed, which causes the job to start over from 0.

This PR doubles the OpenSearch timeout value, which should not affect most runs, but could save the occasional late timeout.

I ran this morning's CCDB indexing using this branch, and it succeeded on the first try. That doesn't prove that the new value saved the run, but I think we should see if the new timeout reduces the churn.

Testing

In addition to test-running the new timeout value, I got the unit tests running again by upgrading python to 3.11 and adjusting the tox configs.

Indexing on EXT Jenkins has become troublesome, possibly because of the increased load on the server from more aggressive security scanning and heavy morning cron jobs. Two types of errors have interrupted indexing: - Disconnects with S3 - Timeouts during OpenSearch indexing These appear to be specific to EXT Jenkins, because our DEV Jenkins instance (zusa) runs the same pipeline code against the same data, at the same time of the morning, and rarely fails to finish. The S3 interruptions are not too disruptive, because the downloads that succeed won't have to be run again on a retry, and little time is lost. The timeout error, however, is painful when indexing bombs late in the indexing run. In the last week, a timeout stopped indexing after 4 million complaints had been indexed, which causes the job to start over from 0. This PR doubles the OpenSearch timeout value, which should not affect most runs, but could save the occasional late timeout. I ran this morning's CCDB indexing using this branch, and it succeeded on the first try. That doesn't prove that the new value saved the run, but I think we should see if the new timeout reduces the churn.

higs4281 requested review from flacoman91 and imuchnik February 12, 2024 15:23

higs4281 added 4 commits February 12, 2024 13:41

Merge branch 'main' into increase-timeout-window

b22e370

set tox to use Python 3.11

fb48527

Merge branch 'main' into increase-timeout-window

afa4d61

Merge branch 'main' into increase-timeout-window

371e792

higs4281 merged commit d5ae4e7 into main Feb 12, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bump the indexing timeout value #68

Bump the indexing timeout value #68

higs4281 commented Feb 12, 2024 •

edited

Loading

Bump the indexing timeout value #68

Bump the indexing timeout value #68

Conversation

higs4281 commented Feb 12, 2024 • edited Loading

Testing

higs4281 commented Feb 12, 2024 •

edited

Loading