-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Store result files to s3 #77
Conversation
a55dd1f
to
ad700a9
Compare
sdgym/benchmark.py
Outdated
@@ -149,11 +150,71 @@ def _score_with_timeout(timeout, synthesizer, metadata, metrics, iteration): | |||
return output | |||
|
|||
|
|||
def _write_to_cache_dir(cache_dir, name, dataset_name, iteration, run_id, scores, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking about changing this approach and, instead of writing a method that handles all the different situations, just write a write_file(contents, path, aws_key, aws_secret)
function that knows whether to write in S3 or in a local file depending on the path prefix.
Additionally, a write_csv
could be added that just dumps the DataFrame as a CSV string and then calls write_file
.
With that, we can keep the code almost as it was before, doing:
base_path = str(cache_dir / f'{name}_{dataset_name}_{iteration}_{run_id}')
if scores is not None:
write_csv(scores, base_path + '_scores.csv')
if 'synthetic_data' in output:
synthetic_data = compress_pickle.dumps(output['synthetic_data'])
write_file(synthetic_data, base_path + '.data.gz')
if 'exception' in output:
write_file(output['exception'],base_path + '_error.txt')
sdgym/s3_utils.py
Outdated
@@ -0,0 +1,48 @@ | |||
import boto3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rename this module to s3.py
, as there is no other place where S3 is mentioned, so there is no ambiguity
5c29ab0
to
9c5536c
Compare
9c5536c
to
ea87801
Compare
) | ||
else: | ||
with open(path, write_mode) as f: | ||
if write_mode == 'w': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@csala I think you're right about writing text files with write mode w
instead of wb
. I updated it to special-case the gzip
files to write mode wb
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All looks good! Let's wait for the rest of the PRs to be merged to this, and then this can go in
388014c
to
ea87801
Compare
If the user passes
cache_dir
as ans3://<bucket-name>/path/to/dir
the scores, synthetic data, and error files are stored in the corresponding path of the given bucket. Ifaws_key
andaws_secret
are provided, they will be used to authenticate the s3 requests.✅ Ran with aws credentials and a private s3 bucket, and verified that the results files were uploaded.
Resolve #81