-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
parallelize sequential flag with drmaa or multiple cores #133
base: main
Are you sure you want to change the base?
Conversation
also when running on CUBIC you'll see the output
for each job submitted which kinda nukes your terminal but Mark said it's just bc he was debugging the scheduler and will take them out |
call = build_validator_call(tmpdirname, | ||
nifti_head, | ||
subj_consist) | ||
# TMPDIR isn't networked (available on login + exec nodes), so use bids_dir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will this be ok if bids_dir is in datalad?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm good point I hadn't thought about needing to unlock stuff. I admit it's very hacky and made me almost think this isn't a good problem to submit to the grid as it requires so many temporary files that need to be on a network drive (not $TMPDIR), but I'm not sure what'd the best solution would be. Maybe we could use a users home directory, say, ~/.cubids
as the tmpdir?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to get a tmpdir on the compute node and copy the files into that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be possible to move more of the logic within the grid job so scripts don't have to be written to a networked drive, but since it's impossible to connect the stdout of the grid job to the main process, the output will ultimately have to get written out to some file which needs to be on a networked drive unless all the jobs, including the main process, are running on the same exec node
if ret.returncode != 0: | ||
logger.error("Errors returned " | ||
"from validator run, parsing now") | ||
"from validator run, parsing now") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this may break flake8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a particular formatter e.g. black or autopep8 you're using for the project?
jids = [] | ||
|
||
for batch in build_drmaa_batch(queue): | ||
tmp = tempfile.NamedTemporaryFile(delete=False, dir=opts.bids_dir, prefix=".", suffix=".sh") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this something a user can customize? Or will they need to customize it? does this work out of the box on cubic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what would need to be customized? It indeed works out of the box on CUBIC, LSF also supports DRMAA but PMACS set it up in a weird way and sounded uninteresting in changing that when I asked :(
Submitting validator jobs to the cluster (in batches) provides massive speedup, running on multiple cores provides more modest speedup. one drawback is that the progress bar is kinda meaningless since jobs are run asynchronously, and when qsubing jobs a bunch of tempfiles need to be created in a networked-mounted directory (uses bids_dir) so they're visible to the exec nodes, which is a bit hacky.