-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Automatically kill stuck PR tests and report back #2440
base: master
Are you sure you want to change the base?
Conversation
A new Pull Request was created by @iarspider for branch master. @cmsbuild, @iarspider, @smuzaffar can you please review it and eventually sign? Thanks. |
cms-bot internal usage |
parse_jenkins_builds.py
Outdated
|
||
if upload_unique_id: | ||
with urllib.request.urlopen( | ||
"http://localhost/SDT/jenkins-artifacts/pull-request-integration/{0}/prs_commits.txt".format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iarspider , why localhost? this job runs on jenkins controller while the prs_commits.txt is available in cmssdt server
Main job: jenkins-elasticsearch-monitor, triggers kill-stuck-pr-test, which in turn triggers killing, status update and comment. TODO: how to avoid multiple comments (if 2+ jobs for a single PR were stuck)? |
Pull request #2440 was updated. |
Pull request #2440 was updated. |
1 similar comment
Pull request #2440 was updated. |
a125238
to
29918e6
Compare
Pull request #2440 was updated. |
29918e6
to
b27c572
Compare
Pull request #2440 was updated. |
b27c572
to
a839828
Compare
a839828
to
482c173
Compare
Pull request #2440 was updated. |
Pull request #2440 was updated. |
Pull request #2440 was updated. |
Pull request #2440 was updated. |
Pull request #2440 was updated. |
Pull request #2440 was updated. |
Pull request #2440 was updated. |
92ae738
to
2c3d508
Compare
Pull request #2440 was updated. |
Pull request #2440 was updated. |
Pull request #2440 was updated. |
The main job runs every 30 minutes. We can add one more check - write identifier of stuck job to a temporary file, and if after 30 min the job is still there, don't try to reconnect the node and just kill the job. |
Split from #2418. For now, job list and criteria for killing builds are hardcoded. We can discuss how configurable do we want this job to be - for example, a dict mapping job name and filters on params to timeout.