Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add shims for Spark 3.4.0 #5472

Merged
merged 10 commits into from
May 27, 2022
Merged

Conversation

firestarman
Copy link
Collaborator

@firestarman firestarman commented May 12, 2022

This PR is to add shims for Spark 3.4.0.

It has mainly

  • created the necessary classes for the 340 spark shims, Rapids shuffle manager and service provider.
  • created shims for Parquet reading related changes, whose names starting with Parquet.
  • created a new class ShimFilePartitionReaderFactory being the parent of the Rapids PERFILE readers, to hide the changes in Spark 3.4.0.
  • added 340 to the build scripts.
  • fixed some build errors by adding shims.

closes #5128
closes #5495

Signed-off-by: Firestarman firestarmanllc@gmail.com

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman
Copy link
Collaborator Author

firestarman commented May 12, 2022

For early review.
It is draft becuase I am figuring out the failing ITs, will file the follow-up issues accordingly.
[Updated] The failing ITs are tracked by #5480.

And there are 21 unit tests failing, also failing on 330, tracked by #5457.

AnsiCastOpSuite:
- Write bytes to string *** FAILED ***
- Write shorts to string *** FAILED ***
- Write ints to string *** FAILED ***
- Write longs to string *** FAILED ***
- Write ints to long *** FAILED ***
- Write longs to int (values within range) *** FAILED ***
- Write longs to short (values within range) *** FAILED ***
- Write longs to byte (values within range) *** FAILED ***
- Write ints to short (values within range) *** FAILED ***
- Write ints to byte (values within range) *** FAILED ***
- Write shorts to byte (values within range) *** FAILED ***
- Write floats to long (values within range) *** FAILED ***
- Write floats to int (values within range) *** FAILED ***
- Write floats to short (values within range) *** FAILED ***
- Write floats to byte (values within range) *** FAILED ***
- Write doubles to long (values within range) *** FAILED ***
- Write doubles to int (values within range) *** FAILED ***
- Write doubles to short (values within range) *** FAILED ***
- Write doubles to byte (values within range) *** FAILED ***
- Copy ints to long *** FAILED ***
- Copy long to float *** FAILED ***

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman firestarman marked this pull request as draft May 12, 2022 07:15
@jlowe jlowe added this to the May 2 - May 20 milestone May 12, 2022
@sameerz sameerz added the build Related to CI / CD or cleanly building label May 12, 2022
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman
Copy link
Collaborator Author

PR is blocked by #5457. Keep draft

@firestarman firestarman requested a review from gerashegalov May 13, 2022 02:13
@firestarman firestarman marked this pull request as ready for review May 16, 2022 01:45
@firestarman
Copy link
Collaborator Author

A follow-up issue #5495

@firestarman
Copy link
Collaborator Author

build

@firestarman
Copy link
Collaborator Author

CI failed, since the premerge build requires updates accordingly. Waiting for 22.08.

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman firestarman changed the base branch from branch-22.06 to branch-21.08 May 23, 2022 11:49
@firestarman firestarman changed the base branch from branch-21.08 to branch-22.08 May 23, 2022 11:50
@firestarman firestarman requested a review from jlowe May 23, 2022 11:58
@firestarman
Copy link
Collaborator Author

A new follow-up #5589

@firestarman
Copy link
Collaborator Author

build

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman firestarman requested a review from jlowe May 24, 2022 02:33
@firestarman
Copy link
Collaborator Author

build

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman
Copy link
Collaborator Author

build

Copy link
Contributor

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be tests on 340+ to verify we're properly falling back on the limit-with-offset scenarios, as it's silent data corruption if we don't get that correct.

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman
Copy link
Collaborator Author

Added tests for it

@firestarman
Copy link
Collaborator Author

build

@firestarman firestarman requested a review from jlowe May 26, 2022 03:22
@firestarman firestarman merged commit 7b743c8 into NVIDIA:branch-22.08 May 27, 2022
@firestarman firestarman deleted the 340-shim branch May 27, 2022 02:04
HaoYang670 pushed a commit to HaoYang670/spark-rapids that referenced this pull request Jun 6, 2022
* Add shims for Spark 340

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
build Related to CI / CD or cleanly building
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] create Spark 3.4 shim
3 participants