Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[SNAP-2225] Different results of nearly identical queries , due to join order #971

Merged
merged 20 commits into from
Mar 19, 2018

Conversation

rishitesh
Copy link
Contributor

@rishitesh rishitesh commented Feb 23, 2018

Changes proposed in this pull request

Removed OrderlessHashpartition class.
a) Handled bucket linking in TableExec and table scan. Removed the tableBucket param from partitioning class. If delink is enabled just keep numPartition as numBuckets.

b) Also removed custom changes in HashPartition. We won't store bucket information in HashPartitioning. Instead based on the flag "linkPartitionToBucket" we can determine the number of partitions to be either numBuckets or num cores assigned to the executor.
This will help in enabling delinkPartition in Smart connector mode as well. This will be tracked with a separate ticket.

c) Fixed couple of issues with partition pruning.
d) Fixed one issue with deleteFrom from IndextTest of the test suite.
e) Removed ClusterSnappyJoinSuite, as both in core & cluster join order optimization should be applied.

Patch testing

precheckin

ReleaseNotes.txt changes

NA

Other PRs

TIBCOSoftware/snappy-spark#95

Copy link
Contributor

@sumwale sumwale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments and suggestion to refactor common code. Looks good otherwise.

p.numPartitions, p.numBuckets, p.tableBuckets)
override def checkHashPartitioning(partitioning: Partitioning): Option[
(Seq[Expression], Int, Int)] = partitioning match {
case p: HashPartitioning => Some(p.expressions, p.numPartitions, p.numBuckets)
case _ => None
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question. What does buckets here change? Since this won't work in smart connector so what is the difference in behaviour?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, In connector mode the scan will happen bucket wise always. When scan starts based on "linkBucketsToPartitions" flag it will determine number of scan partitions. This functionality can also be taken up at optimizer level, so that it works for both connector and embedded mode.

case _ => Nil
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above two methods look similar to those in HashJoinStrategies (apart from Nil vs Int.MaxValue difference). Can be refactored into a base trait.

Or perhaps once this is applied, then that code in HashJoinStrategies can be removed and substituted by a simpler comparison (or this can determine colocated join here itself and inject a separate "ColocatedJoin" plan which can be resolved appropriately in HashJoinStrategies).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored into a single trait.

val withCoalesce = snc.sql(s"select P.OrderRef, P.description from " +
s"$t1 P JOIN $t2 R ON P.OrderId = R.OrderId" +
s" AND coalesce(P.OrderRef,0) = coalesce(R.OrderRef,0)")
// TODO Why an exchange is needed for coalesce.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. It should not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semanticEqual() check for an attribute & its Coalesce returns false. We need to override semanticEqual method for some of these functions.

@sumwale
Copy link
Contributor

sumwale commented Mar 8, 2018

@rishitesh Can you also add a test to check pruning for update/delete after these changes? If they still do not prune then track in a separate ticket.

@rishitesh
Copy link
Contributor Author

@sumwale There are already test for update/select pruning. I have added one for delete as well.

@rishitesh rishitesh merged commit a021f6a into master Mar 19, 2018
@rishitesh rishitesh deleted the SNAP-2225 branch March 19, 2018 05:53
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants