Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Fix GPU degrees function does not overflow [databricks] #5996

Merged
merged 12 commits into from
Jul 19, 2022

Conversation

thirtiseven
Copy link
Collaborator

@thirtiseven thirtiseven commented Jul 13, 2022

On very large numbers GpuToDegrees will not overflow and produce inf when the CPU version does. This appears to be an order of operations issues where the CPU version multiplies by 180 and then divides by pi, while we for performance reasons multiply by (180/pi).

The toDegrees in Spark is a direct call to the toDegrees in the java math library, but since JDK 9 the java implementation has changed to the same order of calculation as the current spark-rapids (from angrad * 180.0 / PI to angrad * (180.0 / PI)), So the results of spark will differ depending on the JDK version in the runtime environment.

Based on the discussion in #109, this PR keep the original implementation, add doc to explain the difference in results, and skip test_degrees when testing with jdk version 8 and lower.

Signed-off-by: thirtiseven ntlihy@gmail.com

Fixes #109.

@thirtiseven thirtiseven self-assigned this Jul 13, 2022
@res-life
Copy link
Collaborator

There are 7 commits, some of them are not relevant, you can rebase -i branch-22.08 after pull upstream/branch-22.08

input.getBase.mul(multiplier)
withResource(Scalar.fromDouble(180d)) { multiplier =>
withResource(input.getBase.mul(multiplier)) { dividend =>
dividend.div(Scalar.fromDouble(Math.PI))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resouce leak, use withResource.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use withResource to close Scalar.fromDouble(Math.PI)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@thirtiseven
Copy link
Collaborator Author

build

1 similar comment
@thirtiseven
Copy link
Collaborator Author

build

@thirtiseven
Copy link
Collaborator Author

build

@thirtiseven thirtiseven changed the title Fix GPU degreees function does not overflow Fix GPU degreees function does not overflow [databricks] Jul 14, 2022
@thirtiseven thirtiseven changed the title Fix GPU degreees function does not overflow [databricks] Fix GPU degrees function does not overflow [databricks] Jul 14, 2022
@thirtiseven
Copy link
Collaborator Author

build

withResource(Scalar.fromDouble(180d)) { multiplier =>
withResource(Scalar.fromDouble(Math.PI)) { divisor =>
input.getBase.mul(multiplier).div(divisor)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

input.getBase.mul(multiplier) produces a resource that needs to be closed.
The 180d scalar can be closed as soon as possible to release some memory. Here it's trivial but this is a good practice.

Suggests:

    val tmp = withResource(Scalar.fromDouble(180d)) { multiplier =>
      input.getBase.mul(multiplier)
    }

    // at here the 180d scalar is closed.
    withResource(tmp) { _ =>
      withResource(Scalar.fromDouble(Math.PI)) { pi =>
        tmp.div(pi)
      }
    }

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, revised.

Signed-off-by: thirtiseven <ntlihy@gmail.com>
@thirtiseven
Copy link
Collaborator Author

build

revans2
revans2 previously approved these changes Jul 14, 2022
@@ -47,9 +47,14 @@ case class GpuAcos(child: Expression) extends CudfUnaryMathExpression("ACOS") {
case class GpuToDegrees(child: Expression) extends GpuUnaryMathExpression("DEGREES") {

override def doColumnar(input: GpuColumnVector): ColumnVector = {
withResource(Scalar.fromDouble(180d / Math.PI)) { multiplier =>
val tmp = withResource(Scalar.fromDouble(180d)) { multiplier =>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can we add a note so no one tries to optimize this code and make it simpler again?

@sameerz sameerz added the bug Something isn't working label Jul 14, 2022
Signed-off-by: thirtiseven <ntlihy@gmail.com>
Haoyang Li and others added 4 commits July 18, 2022 11:24
@thirtiseven
Copy link
Collaborator Author

build

@@ -611,20 +612,22 @@ def test_radians(data_gen):
assert_gpu_and_cpu_are_equal_collect(
lambda spark : unary_op_df(spark, data_gen).selectExpr('radians(a)'))

def is_before_jdk_9():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some java -version samples, like:
// Allow these formats:
// 1.8.0_72-ea
// 9-ea
// 9
// 9.0.1

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, will add some comments.

@thirtiseven
Copy link
Collaborator Author

build

jdk_version = jdk_version[2:]
dot_pos = jdk_version.find('.')
dash_pos = jdk_version.find('-')
jdk_version = jdk_version[0:dot_pos if dot_pos != -1 else dash_pos]
Copy link
Collaborator

@res-life res-life Jul 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If jdk_version=9, will get version '', then int('') throws an exception.

@@ -611,20 +612,30 @@ def test_radians(data_gen):
assert_gpu_and_cpu_are_equal_collect(
lambda spark : unary_op_df(spark, data_gen).selectExpr('radians(a)'))

def get_java_version():
jdk_version = check_output(['java', '-version'], stderr=STDOUT)
jdk_version = jdk_version.split(b'\n')[0].split(b' ')[2].decode('utf-8')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works well for the following version prints.

java version "1.8.0_25"
Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)

openjdk version "1.8.0_302"
OpenJDK Runtime Environment (build 1.8.0_302-bre_2021_07_21_23_11-b00)
OpenJDK 64-Bit Server VM (build 25.302-b00, mixed mode)

Nit:
Seems the following code is more readable:

        for line in output.splitlines():
            # E.g. openjdk version "1.8.0_212"
            line = line.strip()
            if 'version' in line:
                ver = line.split('version', 1)[1].strip()
                if ver.startswith('"') and ver.endswith('"'):
                    ver = ver[1:-1]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, revised.

Signed-off-by: thirtiseven <ntlihy@gmail.com>
res-life
res-life previously approved these changes Jul 18, 2022
@thirtiseven thirtiseven marked this pull request as ready for review July 18, 2022 09:41
@thirtiseven
Copy link
Collaborator Author

build

Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some docs to https://github.com/NVIDIA/spark-rapids/blob/branch-22.08/docs/compatibility.md? This is an odd corner case and it would be good to have users know what is happening.

@@ -611,20 +612,37 @@ def test_radians(data_gen):
assert_gpu_and_cpu_are_equal_collect(
lambda spark : unary_op_df(spark, data_gen).selectExpr('radians(a)'))

def get_java_version():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Would it be better to use the java bridge

spark._jvm.java.lang.Runtime.version()

And if that works should we move this to a common place, like with is_before_spark_320()...

I don't see a lot of others wanting to use this, but it would be nice to have it in a common place anyways.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, java bridge is better, thanks. I used System.getProperty("java.version") because Runtime.version() is since JDK9.

Signed-off-by: thirtiseven <ntlihy@gmail.com>
@thirtiseven
Copy link
Collaborator Author

build

Signed-off-by: thirtiseven <ntlihy@gmail.com>
Signed-off-by: thirtiseven <ntlihy@gmail.com>
@thirtiseven
Copy link
Collaborator Author

build

@revans2 revans2 merged commit e945e3e into NVIDIA:branch-22.08 Jul 19, 2022
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] GPU degreees function does not overflow
4 participants