Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

fix gcs inputs #82

Merged
merged 4 commits into from
May 2, 2022
Merged

fix gcs inputs #82

merged 4 commits into from
May 2, 2022

Conversation

sharvenm
Copy link
Contributor

fix gcs inputs in generic file source
update sample dag inputs

fix gcs inputs in generic file source
update sample dag inputs
danielkulikov
danielkulikov previously approved these changes Apr 19, 2022
Copy link
Contributor

@eugenemiretsky eugenemiretsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we able to add unit tests?

dags/config/gcs.yaml Outdated Show resolved Hide resolved
Copy link

@rabdulatipoff rabdulatipoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DAGs config looks fine, though I'm not sure what using an empty bucket prefix would mean (would it be a filesystem or a Cloud Ctorage lookup?), also I'd use os.path routines instead of concatenating path strings manually -- looks fine otherwise

@soyarym
Copy link
Collaborator

soyarym commented Apr 20, 2022

DAGs config looks fine, though I'm not sure what using an empty bucket prefix would mean (would it be a filesystem or a Cloud Ctorage lookup?), also I'd use os.path routines instead of concatenating path strings manually -- looks fine otherwise

I agree.
@eugenemiretsky @sharvenm could you please give an example of bucket prefix we are expecting?

@sharvenm
Copy link
Contributor Author

@rabdulatipoff @soyarym an empty bucket prefix would mean that the file (blob since really we're only dealing with gcs for these buckets) we are searching for is not located inside any "folders". So if our file name was "users.csv" and our bucketname was "gcs-airflow"

The path to that blob should be: gs://gcs-airflow/users.csv

If instead the blob was located under "folders" (gcs doesn't actually have folders, but we can think of blob names like that) and the full path to the blob was: gs://gcs-airflow/some-folder/users.csv then we would need the gcs-prefix to be "some-folder"

I like the idea of using os.path.join so I've added that as you guys recommended.

@soyarym
Copy link
Collaborator

soyarym commented Apr 20, 2022

@sharvenm
Oh, okay, got it.
I guess if you like the idea of using os.path.join then maybe would better to use it in the other part of the script instead of '/' in the same cases?

Create class for unit tests
Modify to use os.path.join to list files
@sharvenm sharvenm force-pushed the hotfix/gcs-file-sources branch from 760d4d0 to f69a0d3 Compare April 20, 2022 21:13
Updating configs and code to utilize default inputs from the file_source_config data class
Updating password values to be uncommon phrases as airflow will mask sensitive data in the logs

Updating readme
@sharvenm sharvenm force-pushed the hotfix/gcs-file-sources branch from f609276 to 71c59a6 Compare April 21, 2022 18:34
@sharvenm sharvenm requested a review from eugenemiretsky April 21, 2022 19:33
@sharvenm sharvenm merged commit 7b8cb2b into main May 2, 2022
@sharvenm sharvenm deleted the hotfix/gcs-file-sources branch May 2, 2022 13:26
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants