-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
External backing storage demo #668
base: master
Are you sure you want to change the base?
External backing storage demo #668
Conversation
/gcbrun |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @bennsimon for this change; just added some suggestions/questions.
docker/drivers-build/Dockerfile
Outdated
@@ -0,0 +1,2 @@ | |||
FROM busybox:1.36 | |||
COPY postgresql-42.6.0.jar /jdbcDrivers/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a github security policy not to check-in binary artifacts as they are not reviewable (details here). Can we find another way of copying PostgreSQL driver, e.g., fetching it from an official docker image during the image build process? (We are violating this in one particular case which should be fixed too but I don't want to add more to it.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, i can fetch it from the official site during the build process.
docker/drivers-build/Dockerfile
Outdated
@@ -0,0 +1,2 @@ | |||
FROM busybox:1.36 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are going to build a new image, why not start from bitnami/spark
as base then copy the required driver?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To reduce the size of the image. When i use bitnami/spark
base the final image becomes more than a GB but if i use a minimal image as base the size becomes ~15mb.
I was thinking that since the only purpose of this image is to host all the drivers why not use a minimal image. It frees the need to maintain a parallel spark
image.
If we use bitnami/spark
as base we wont need to copy over the drivers but then we will need to maintain a parallel spark
image.
@@ -0,0 +1,43 @@ | |||
<!-- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just reuse/adapt docker/hive-site_example.xml
instead of creating a new one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sure will do.
Thanks again @bennsimon; I think we discussed this in other forums, can you please remind me if this is still needed and if yes, update it to be merged? It would be great to add a |
Hey @bashir2, yeah i think it will be needed to demonstrate that one can setup spark thriftserver with database other than derby database. I will add the README.md. |
4d366ae
to
70feb27
Compare
remove postgresql-42.6.0.jar
70feb27
to
bd6b026
Compare
Description of what I changed
E2E test
TESTED:
Tested on my local machine by running below:
Checklist: I completed these to help reviewers :)
I have read and will follow the review process.
I am familiar with Google Style Guides for the language I have coded in.
No? Please take some time and review Java and Python style guides.
My IDE is configured to follow the Google code styles.
No? Unsure? -> configure your IDE.
I have added tests to cover my changes. (If you refactored existing code that was well tested you do not have to add tests)
I ran
mvn clean package
right before creating this pull request and added all formatting changes to my commit.All new and existing tests passed.
My pull request is based on the latest changes of the master branch.
No? Unsure? -> execute command
git pull --rebase upstream master