Skip to content

Commit

Permalink
[SPARK-39749][SQL] Always use plain string representation on casting …
Browse files Browse the repository at this point in the history
…Decimal to String

### What changes were proposed in this pull request?

Currently, casting decimal as string type will result in Strings with exponential notations if the adjusted exponent is less than -6. This is consistent with BigDecimal.toString https://docs.oracle.com/javase/8/docs/api/java/math/BigDecimal.html#toString

After this PR, the casting always uses plain string representation.

### Why are the changes needed?

1. The current behavior doesn't compliant to the ANSI SQL standard.
<img width="918" alt="image" src="https://user-images.githubusercontent.com/1097932/178395756-baecbe90-7a5f-4b4c-b63c-9f1fdf656107.png">
<img width="603" alt="image" src="https://user-images.githubusercontent.com/1097932/178395567-fa5b6877-ff08-48b5-b715-243c954d6bbc.png">

2. It is different from databases like PostgreSQL/Oracle/MS SQL server/etc.
3. The current behavior may surprise users since it only happens when the adjusted exponent is less than -6. The following query will return `false` by default (when ANSI SQL mode is off) since the `0.0000000123` is converted as `1.23E-7`:
```sql
select '0.000000123' in (0.000000123);
```

### Does this PR introduce _any_ user-facing change?

Yes, after changes, Spark SQL always uses plain string representation on casting Decimal to String. To restore the legacy behavior, which uses scientific notation if the adjusted exponent is less than -6, set `spark.sql.legacy.castDecimalToString.enabled` to `true`.

### How was this patch tested?

Unit test

Closes #37160 from gengliangwang/decimalToString.

Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
  • Loading branch information
gengliangwang committed Jul 13, 2022
1 parent a79c91e commit c621df2
Show file tree
Hide file tree
Showing 5 changed files with 27 additions and 0 deletions.
1 change: 1 addition & 0 deletions docs/sql-migration-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ license: |

- Since Spark 3.4, Number or Number(\*) from Teradata will be treated as Decimal(38,18). In Spark 3.3 or earlier, Number or Number(\*) from Teradata will be treated as Decimal(38, 0), in which case the fractional part will be removed.
- Since Spark 3.4, v1 database, table, permanent view and function identifier will include 'spark_catalog' as the catalog name if database is defined, e.g. a table identifier will be: `spark_catalog.default.t`. To restore the legacy behavior, set `spark.sql.legacy.v1IdentifierNoCatalog` to `true`.
- Since Spark 3.4, the results of casting Decimal values as String type will not contain exponential notations. To restore the legacy behavior, which uses scientific notation if the adjusted exponent is less than -6, set `spark.sql.legacy.castDecimalToString.enabled` to `true`.

## Upgrading from Spark SQL 3.2 to 3.3

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -512,6 +512,7 @@ case class Cast(
TimestampFormatter.getFractionFormatter(ZoneOffset.UTC)

private val legacyCastToStr = SQLConf.get.getConf(SQLConf.LEGACY_COMPLEX_TYPES_TO_STRING)
private val legacyCastDecimalToStr = SQLConf.get.getConf(SQLConf.LEGACY_DECIMAL_TO_STRING)
// The brackets that are used in casting structs and maps to strings
private val (leftBracket, rightBracket) = if (legacyCastToStr) ("[", "]") else ("{", "}")

Expand Down Expand Up @@ -625,6 +626,8 @@ case class Cast(
case DayTimeIntervalType(startField, endField) =>
buildCast[Long](_, i => UTF8String.fromString(
IntervalUtils.toDayTimeIntervalString(i, ANSI_STYLE, startField, endField)))
case _: DecimalType if !legacyCastDecimalToStr =>
buildCast[Decimal](_, d => UTF8String.fromString(d.toPlainString))
case _ => buildCast[Any](_, o => UTF8String.fromString(o.toString))
}

Expand Down Expand Up @@ -1475,6 +1478,8 @@ case class Cast(
$evPrim = UTF8String.fromString($iu.toDayTimeIntervalString($c, $style,
(byte)${i.startField}, (byte)${i.endField}));
"""
case _: DecimalType if !legacyCastDecimalToStr =>
(c, evPrim, _) => code"$evPrim = UTF8String.fromString($c.toPlainString());"
case _ =>
(c, evPrim, evNull) => code"$evPrim = UTF8String.fromString(String.valueOf($c));"
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3697,6 +3697,17 @@ object SQLConf {
.booleanConf
.createWithDefault(false)

val LEGACY_DECIMAL_TO_STRING =
buildConf("spark.sql.legacy.castDecimalToString.enabled")
.internal()
.doc("When true, casting decimal values as string will use scientific notation if an " +
"exponent is needed, which is the same with the method java.math.BigDecimal.toString(). " +
"Otherwise, the casting result won't contain an exponent field, which is compliant to " +
"the ANSI SQL standard.")
.version("3.4.0")
.booleanConf
.createWithDefault(false)

val LEGACY_PATH_OPTION_BEHAVIOR =
buildConf("spark.sql.legacy.pathOptionBehavior.enabled")
.internal()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -225,6 +225,8 @@ final class Decimal extends Ordered[Decimal] with Serializable {

override def toString: String = toBigDecimal.toString()

def toPlainString: String = toBigDecimal.bigDecimal.toPlainString

def toDebugString: String = {
if (decimalVal.ne(null)) {
s"Decimal(expanded, $decimalVal, $precision, $scale)"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1305,4 +1305,12 @@ abstract class CastSuiteBase extends SparkFunSuite with ExpressionEvalHelper {
Cast(child, DecimalType.USER_DEFAULT), it)
}
}

test("SPARK-39749: cast Decimal to string") {
val input = Literal.create(Decimal(0.000000123), DecimalType(9, 9))
checkEvaluation(cast(input, StringType), "0.000000123")
withSQLConf(SQLConf.LEGACY_DECIMAL_TO_STRING.key -> "true") {
checkEvaluation(cast(input, StringType), "1.23E-7")
}
}
}

0 comments on commit c621df2

Please # to comment.