-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[BUG] -0.0 vs 0.0 is a hot mess #294
Comments
I filed https://issues.apache.org/jira/browse/SPARK-32110 to document what I have found in spark. |
Some findings when compared against Apache Hive 3.x:
So in this regard, the only material difference between Hive and SparkSQL is that on equijoins, Hive does not normalize, and treats |
I filed rapidsai/cudf#6834 in cudf so we can work around things with bit-wise operations if possible. I believe that we should be able to make comparisons and sort match exactly with Spark. On joins we are going to have a much harder time, but we still might be able to do it. We need to be very careful with this though. -0.0 and the various NaN values are rather rare in real life. I am not sure if it is worth the added performance cost for sort to do this, and the join I am especially concerned about what it would take to make it work. |
Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>
This is related to #84 and is a super set of it.
Spark is a bit of a hot mess with support for floating point
-0.0
Most SQL implementations normalize
-0.0
to0.0
. Spark does this for the SQL parser, but not for the dataframe API. Also spark violates ieee spec where-0.0
!=0.0
This is because javaDouble.compare
andFloat.compare
treat-0.0
as <0.0
This is true everywhere except for a few cases. equi-joins and hash aggregate keys. Hive does not do these. It always assumes that they are different.
For cudf it follows ieee where they always end up being the same. This causes issues in both sort, comparison operators, and joins that are not equijoins.
I will file something against spark, but I don't have high hopes that anything will be fixed.
The text was updated successfully, but these errors were encountered: