Store result files to s3 #77

katxiao · 2021-05-15T00:14:05Z

If the user passes cache_dir as an s3://<bucket-name>/path/to/dir the scores, synthetic data, and error files are stored in the corresponding path of the given bucket. If aws_key and aws_secret are provided, they will be used to authenticate the s3 requests.

✅ Ran with aws credentials and a private s3 bucket, and verified that the results files were uploaded.

Resolve #81

csala · 2021-05-18T17:02:28Z

sdgym/benchmark.py

@@ -149,11 +150,71 @@ def _score_with_timeout(timeout, synthesizer, metadata, metrics, iteration):
        return output


+def _write_to_cache_dir(cache_dir, name, dataset_name, iteration, run_id, scores,


I'm thinking about changing this approach and, instead of writing a method that handles all the different situations, just write a write_file(contents, path, aws_key, aws_secret) function that knows whether to write in S3 or in a local file depending on the path prefix.
Additionally, a write_csv could be added that just dumps the DataFrame as a CSV string and then calls write_file.

With that, we can keep the code almost as it was before, doing:

base_path = str(cache_dir / f'{name}_{dataset_name}_{iteration}_{run_id}') if scores is not None: write_csv(scores, base_path + '_scores.csv') if 'synthetic_data' in output: synthetic_data = compress_pickle.dumps(output['synthetic_data']) write_file(synthetic_data, base_path + '.data.gz') if 'exception' in output: write_file(output['exception'],base_path + '_error.txt')

csala · 2021-05-18T17:03:36Z

sdgym/s3_utils.py

@@ -0,0 +1,48 @@
+import boto3


I would rename this module to s3.py, as there is no other place where S3 is mentioned, so there is no ambiguity

katxiao · 2021-05-19T17:55:19Z

sdgym/s3.py

+        )
+    else:
+        with open(path, write_mode) as f:
+            if write_mode == 'w':


@csala I think you're right about writing text files with write mode w instead of wb. I updated it to special-case the gzip files to write mode wb.

csala

All looks good! Let's wait for the rest of the PRs to be merged to this, and then this can go in

katxiao requested a review from csala May 15, 2021 00:14

katxiao force-pushed the store-results-to-s3 branch 4 times, most recently from a55dd1f to ad700a9 Compare May 17, 2021 23:54

csala suggested changes May 18, 2021

View reviewed changes

This was referenced May 18, 2021

Allow collect to read from and write to s3 #84

Merged

Allow output_path to be a s3 path #83

Merged

katxiao force-pushed the store-results-to-s3 branch 3 times, most recently from 5c29ab0 to 9c5536c Compare May 18, 2021 21:01

Allow cache dir to be a s3 bucket

ea87801

katxiao force-pushed the store-results-to-s3 branch from 9c5536c to ea87801 Compare May 19, 2021 17:54

katxiao commented May 19, 2021

View reviewed changes

katxiao added 2 commits May 19, 2021 10:55

Write output to s3

283780c

Allow collect to read from and write to s3

3a912b3

csala approved these changes May 19, 2021

View reviewed changes

katxiao force-pushed the store-results-to-s3 branch from 388014c to ea87801 Compare May 19, 2021 18:40

katxiao added 3 commits May 19, 2021 12:06

Merge branch 'store-output-to-s3' into store-results-to-s3

5f55ef6

Merge branch 'collect-results-from-s3' into store-results-to-s3

b7b3d5d

Add unit tests for s3 utils

a0fa6d6

katxiao changed the title ~~Allow cache dir to be a s3 bucket~~ Store result files to s3 May 19, 2021

katxiao merged commit cd9c411 into master May 19, 2021

katxiao deleted the store-results-to-s3 branch May 19, 2021 22:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store result files to s3 #77

Store result files to s3 #77

katxiao commented May 15, 2021 •

edited by csala

Loading

csala May 18, 2021

csala May 18, 2021

katxiao May 19, 2021

csala left a comment

		@@ -149,11 +150,71 @@ def _score_with_timeout(timeout, synthesizer, metadata, metrics, iteration):
		return output


		def _write_to_cache_dir(cache_dir, name, dataset_name, iteration, run_id, scores,

Store result files to s3 #77

Store result files to s3 #77

Conversation

katxiao commented May 15, 2021 • edited by csala Loading

csala May 18, 2021

Choose a reason for hiding this comment

csala May 18, 2021

Choose a reason for hiding this comment

katxiao May 19, 2021

Choose a reason for hiding this comment

csala left a comment

Choose a reason for hiding this comment

katxiao commented May 15, 2021 •

edited by csala

Loading