Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

feat: S3 batch storage #1087

Merged
merged 28 commits into from
Nov 17, 2022
Merged

Conversation

jamielxcarter
Copy link
Contributor

@jamielxcarter jamielxcarter commented Oct 19, 2022

Closes #1070


📚 Documentation preview 📚: https://meltano-sdk--1087.org.readthedocs.build/en/1087/

@jamielxcarter jamielxcarter changed the title S3 batch storage feat: S3 batch storage Oct 19, 2022
pyproject.toml Outdated Show resolved Hide resolved
@aaronsteers
Copy link
Contributor

aaronsteers commented Oct 19, 2022

@jamielxcarter - This is awesome - I'm happy to see the implementation was so straightforward.

What is unclear to me as of yet is what instructions to provide to (1) users and (2) developers.

For developers, if they want to opt in to including the s3 extra, they probably would do so in poetry.toml - should we have a commented-out section in the cookiecutter for guidance?

And for users, especially if the developer has not already bundled the s3 extra, what's the recommendation to them for bundling the extra library during their install? Assuming a tool like pipx or meltano is auto-creating a virtual environment, it's not clear to me how to add the s3 library into the virtualenv appropriately. Perhaps?:

  1. For meltano users, should they append singer-sdk[s3] or fs-s3fs to the pip_url in meltano.yml.
  2. For pipx users, should they use pipx inject, as in pipx inject tap-mytap fs-s3fs or similar?

cc @edgarrmondragon

noxfile.py Outdated Show resolved Hide resolved
@edgarrmondragon
Copy link
Collaborator

@aaronsteers @jamielxcarter I think there's a couple of options to trickle down support for S3, and the distinction between

  • Developers: tap creators and maintainers
  • Users: users of the tap (via Meltano or otherwise)

is useful.

Option 1: All Users get the extra dependency

[tool.poetry.dependencies]
python = "<3.11,>=3.7.1"
singer-sdk = { version="0.11.1", extras = ["s3"] }

Option 2: Extras all the way down

Make the extra dependency a passthrough, so users would have to explicitly request the extra:

pipx install 'tap-github[s3]'

Doesn't work at the moment, unfortunately:

@aaronsteers
Copy link
Contributor

@aaronsteers @jamielxcarter I think there's a couple of options to trickle down support for S3, and the distinction between

  • Developers: tap creators and maintainers
  • Users: users of the tap (via Meltano or otherwise)

is useful.

Option 1: All Users get the extra dependency

[tool.poetry.dependencies]
python = "<3.11,>=3.7.1"
singer-sdk = { version="0.11.1", extras = ["s3"] }

Option 2: Extras all the way down

Make the extra dependency a passthrough, so users would have to explicitly request the extra:

pipx install 'tap-github[s3]'

Doesn't work at the moment, unfortunately:

Well said.

I think for taps and targets using AWS-inclined system like redshift and snowflake, the developer can choose option 1 and just ship it all the time for all users.

For the user opt-in scenario (meaning the tap doesn't bias towards aws/s3, but still the user might want/need to use S3 anyway), is it just the automatic pass-through that doesn't work? Or, said another way, could we explicitly re-declare the s3 in the boilerplate cookiecutter templates? This would presumably give all users the option, but of course the extra declaration could still be removed by a developer - or not included in the first place.

Would that work? And if developers remove it (or don't add it), the user would need to find another means to inject?

@jamielxcarter jamielxcarter requested a review from a team as a code owner October 28, 2022 15:16
@jamielxcarter
Copy link
Contributor Author

@aaronsteers @jamielxcarter I think there's a couple of options to trickle down support for S3, and the distinction between

  • Developers: tap creators and maintainers
  • Users: users of the tap (via Meltano or otherwise)

is useful.

Option 1: All Users get the extra dependency

[tool.poetry.dependencies]
python = "<3.11,>=3.7.1"
singer-sdk = { version="0.11.1", extras = ["s3"] }

Option 2: Extras all the way down

Make the extra dependency a passthrough, so users would have to explicitly request the extra:

pipx install 'tap-github[s3]'

Doesn't work at the moment, unfortunately:

Well said.

I think for taps and targets using AWS-inclined system like redshift and snowflake, the developer can choose option 1 and just ship it all the time for all users.

For the user opt-in scenario (meaning the tap doesn't bias towards aws/s3, but still the user might want/need to use S3 anyway), is it just the automatic pass-through that doesn't work? Or, said another way, could we explicitly re-declare the s3 in the boilerplate cookiecutter templates? This would presumably give all users the option, but of course the extra declaration could still be removed by a developer - or not included in the first place.

Would that work? And if developers remove it (or don't add it), the user would need to find another means to inject?

@aaronsteers @edgarrmondragon I think I've implemented the changes discussed by you both. Let me know if anything is missing!

@jamielxcarter jamielxcarter requested review from edgarrmondragon and removed request for cjohnhanson November 7, 2022 14:20
Copy link
Contributor

@aaronsteers aaronsteers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for applying the recent changes.

One question/suggestion below.

cc @edgarrmondragon to rereview.

@codecov
Copy link

codecov bot commented Nov 9, 2022

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 83.60%. Comparing base (c4e2301) to head (ef9cfcd).
Report is 1005 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1087   +/-   ##
=======================================
  Coverage   83.59%   83.60%           
=======================================
  Files          42       42           
  Lines        3908     3909    +1     
  Branches      665      665           
=======================================
+ Hits         3267     3268    +1     
  Misses        475      475           
  Partials      166      166           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@aaronsteers aaronsteers merged commit ca27fd5 into meltano:main Nov 17, 2022
@jamielxcarter jamielxcarter deleted the s3-batch-storage branch November 17, 2022 17:28
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Feature: Add s3:// as a supported storage root for batch files.
3 participants