Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[SPARK-49549][SQL] Assign a name to the error conditions _LEGACY_ERROR_TEMP_3055, 3146 #48288

Closed
wants to merge 1 commit into from

Conversation

mrk-andreev
Copy link
Contributor

@mrk-andreev mrk-andreev commented Sep 27, 2024

What changes were proposed in this pull request?

Choose a proper name for the error conditions _LEGACY_ERROR_TEMP_3055 and _LEGACY_ERROR_TEMP_3146 defined in core/src/main/resources/error/error-conditions.json. The name should be short but complete (look at the example in error-conditions.json).

Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using checkError(). The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-conditions.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError().

Why are the changes needed?

This changes needed because spark 4 introduce new approach with user friendly error messages.

Does this PR introduce any user-facing change?

Yes, if user's code depends on error condition names.

How was this patch tested?

Unit tests

Was this patch authored or co-authored using generative AI tooling?

No

@@ -182,7 +182,7 @@ object V2ExpressionUtils extends SQLConfHelper with Logging {
ApplyFunctionExpression(scalarFunc, arguments)
case _ =>
throw new AnalysisException(
errorClass = "_LEGACY_ERROR_TEMP_3055",
errorClass = "SCALAR_FUNCTION_NOT_FULLY_IMPLEMENTED",
messageParameters = Map("scalarFunc" -> scalarFunc.name()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since scalarFunc.name is an id, let's quote it by toSQLId()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced with

messageParameters = Map("scalarFunc" -> toSQLId(scalarFunc.name())))

},
"SCALAR_FUNCTION_NOT_FULLY_IMPLEMENTED" : {
"message" : [
"ScalarFunction <scalarFunc> neither implements magic method nor override 'produceResult'"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should explicitly say what we expect: a method with the name produceResult which accepts a parameter of the type InternalRow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced with

"ScalarFunction <scalarFunc> not implements or overrides method 'produceResult(InternalRow)'."

@@ -3975,6 +3975,16 @@
],
"sqlState" : "21000"
},
"SCALAR_FUNCTION_NOT_COMPATIBLE" : {
"message" : [
"Cannot find a compatible ScalarFunction#produceResult"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we suggest users how to fix the issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced with

"ScalarFunction <scalarFunc> not overrides method 'produceResult(InternalRow)' with custom implementation."

condition = "_LEGACY_ERROR_TEMP_3055",
parameters = Map("scalarFunc" -> "long_add_mismatch_magic"),
condition = "SCALAR_FUNCTION_NOT_FULLY_IMPLEMENTED",
parameters = Map(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no scalarFunc parameter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed with

parameters = Map("scalarFunc" -> "`long_add_mismatch_magic`"),

@mrk-andreev mrk-andreev force-pushed the SPARK-49549 branch 2 times, most recently from 513c953 to 943845a Compare October 8, 2024 20:30
@@ -149,7 +149,7 @@ public interface ScalarFunction<R> extends BoundFunction {
* @return a result value
*/
default R produceResult(InternalRow input) {
throw new SparkUnsupportedOperationException("_LEGACY_ERROR_TEMP_3146");
throw new SparkUnsupportedOperationException("SCALAR_FUNCTION_NOT_COMPATIBLE");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error condition requires the parameter scalarFunc. Could you pass it here, please.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed with

throw new SparkUnsupportedOperationException(
            "SCALAR_FUNCTION_NOT_COMPATIBLE",
            Map.of("scalarFunc", QuotingUtils.quoteIdentifier(name()))
    );

I used QuotingUtils.quoteIdentifier instead toSQLId because this method is unavaible from this location.

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except of a comment. Also update PR's description:

  • remove the link to JIRA ticket
  • error-classes.json -> error-conditions.json

Comment on lines 156 to 157
"SCALAR_FUNCTION_NOT_COMPATIBLE",
Map.of("scalarFunc", QuotingUtils.quoteIdentifier(name()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you fix indentation. Should be 2 gaps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you fix indentation. Should be 2 gaps.

Fixed

LGTM except of a comment. Also update PR's description:
remove the link to JIRA ticket
error-classes.json -> error-conditions.json

Fixed

@mrk-andreev mrk-andreev force-pushed the SPARK-49549 branch 2 times, most recently from 4ff4cf7 to 186c5bc Compare October 9, 2024 11:36
@MaxGekk
Copy link
Member

MaxGekk commented Oct 9, 2024

@mrk-andreev Could you fix the failed tests:

[info] - Error conditions are correctly formatted *** FAILED *** (103 milliseconds)
[info]   "...023"
[info]     },
[info]     "SCALAR_[FUNCTION_NOT_COMPATIBLE" : {
[info]       "message" : [
[info]         "ScalarFunction <scalarFunc> not overrides method 'produceResult(InternalRow)' with custom implementation."
[info]       ],
[info]       "sqlState" : "42K0O"
[info]     },

You should reformat error-conditions.json by the command:

$ SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "core/testOnly *SparkThrowableSuite -- -t \"Error conditions are correctly formatted\""

@MaxGekk
Copy link
Member

MaxGekk commented Oct 9, 2024

Waiting for CI. @mrk-andreev Could you re-trigger the failed GitHub action:
Run / Build modules: api, catalyst, hive-thriftserver

@mrk-andreev
Copy link
Contributor Author

mrk-andreev commented Oct 9, 2024

@MaxGekk fixed.

Build

@MaxGekk
Copy link
Member

MaxGekk commented Oct 10, 2024

+1, LGTM. Merging to master.
Thank you, @mrk-andreev.

@MaxGekk MaxGekk closed this in f0498f0 Oct 10, 2024
himadripal pushed a commit to himadripal/spark that referenced this pull request Oct 19, 2024
…R_TEMP_3055, 3146

### What changes were proposed in this pull request?

Choose a proper name for the error conditions _LEGACY_ERROR_TEMP_3055 and _LEGACY_ERROR_TEMP_3146 defined in core/src/main/resources/error/error-conditions.json. The name should be short but complete (look at the example in error-conditions.json).

Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using checkError(). The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-conditions.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError().

### Why are the changes needed?

This changes needed because spark 4 introduce new approach with user friendly error messages.

### Does this PR introduce _any_ user-facing change?

Yes, if user's code depends on error condition names.

### How was this patch tested?

Unit tests

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#48288 from mrk-andreev/SPARK-49549.

Authored-by: Mark Andreev <mark.andreev@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants