-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[SNAP-2225] Different results of nearly identical queries , due to join order #971
Conversation
Handled join order in optimization phase.
Conflicts: core/src/main/scala/org/apache/spark/sql/internal/SnappySessionState.scala
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few comments and suggestion to refactor common code. Looks good otherwise.
p.numPartitions, p.numBuckets, p.tableBuckets) | ||
override def checkHashPartitioning(partitioning: Partitioning): Option[ | ||
(Seq[Expression], Int, Int)] = partitioning match { | ||
case p: HashPartitioning => Some(p.expressions, p.numPartitions, p.numBuckets) | ||
case _ => None | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question. What does buckets here change? Since this won't work in smart connector so what is the difference in behaviour?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, In connector mode the scan will happen bucket wise always. When scan starts based on "linkBucketsToPartitions" flag it will determine number of scan partitions. This functionality can also be taken up at optimizer level, so that it works for both connector and embedded mode.
case _ => Nil | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above two methods look similar to those in HashJoinStrategies (apart from Nil vs Int.MaxValue difference). Can be refactored into a base trait.
Or perhaps once this is applied, then that code in HashJoinStrategies can be removed and substituted by a simpler comparison (or this can determine colocated join here itself and inject a separate "ColocatedJoin" plan which can be resolved appropriately in HashJoinStrategies).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored into a single trait.
val withCoalesce = snc.sql(s"select P.OrderRef, P.description from " + | ||
s"$t1 P JOIN $t2 R ON P.OrderId = R.OrderId" + | ||
s" AND coalesce(P.OrderRef,0) = coalesce(R.OrderRef,0)") | ||
// TODO Why an exchange is needed for coalesce. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. It should not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The semanticEqual() check for an attribute & its Coalesce returns false. We need to override semanticEqual method for some of these functions.
@rishitesh Can you also add a test to check pruning for update/delete after these changes? If they still do not prune then track in a separate ticket. |
@sumwale There are already test for update/select pruning. I have added one for delete as well. |
Changes proposed in this pull request
Removed OrderlessHashpartition class.
a) Handled bucket linking in TableExec and table scan. Removed the tableBucket param from partitioning class. If delink is enabled just keep numPartition as numBuckets.
b) Also removed custom changes in HashPartition. We won't store bucket information in HashPartitioning. Instead based on the flag "linkPartitionToBucket" we can determine the number of partitions to be either numBuckets or num cores assigned to the executor.
This will help in enabling delinkPartition in Smart connector mode as well. This will be tracked with a separate ticket.
c) Fixed couple of issues with partition pruning.
d) Fixed one issue with deleteFrom from IndextTest of the test suite.
e) Removed ClusterSnappyJoinSuite, as both in core & cluster join order optimization should be applied.
Patch testing
precheckin
ReleaseNotes.txt changes
NA
Other PRs
TIBCOSoftware/snappy-spark#95