[SPARK-52065][SQL] Produce another plan tree with output columns (name, data type, nullability) in plan change logging #50852
+89
−18
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
We propose to add another tree string which focuses to produce output columns with data type and nullability. This will be shown in plan change logger, along with existing tree string plan.
For example, here is a one of example from plan change logging:
In some cases, it's not even feasible to evaluate the output of the node. (e.g. Project with Star expression) In that case, we will simply put
<output=unresolved>
since it's mostly due to UnresolvedException.For example,
Why are the changes needed?
We recently got into very tricky issue (nullability change broke stateful operator) which required custom debug logging on plan change logging. This is because of lack of visibility for the output columns, especially their nullability, in our tree string of the plan.
Ideally, we shouldn't have two different tree strings and just make a fix to the existing tree string, but in many cases, current tree string is long enough so that we had to restrict the number of fields to show, hence we think it's better to have a separate tree plan for it.
Does this PR introduce any user-facing change?
Yes, when they change SQL config for plan change logger log level to their visible log level in log4j2 config. The application of this change is at least opt-in instead of opt-out.
(If we are changing the existing tree string, it will change many places.)
How was this patch tested?
Modified UT to cover the change.
Was this patch authored or co-authored using generative AI tooling?
No.