[SNAP-2225] Different results of nearly identical queries , due to join order #971

rishitesh · 2018-02-23T09:33:39Z

Changes proposed in this pull request

Removed OrderlessHashpartition class.
a) Handled bucket linking in TableExec and table scan. Removed the tableBucket param from partitioning class. If delink is enabled just keep numPartition as numBuckets.

b) Also removed custom changes in HashPartition. We won't store bucket information in HashPartitioning. Instead based on the flag "linkPartitionToBucket" we can determine the number of partitions to be either numBuckets or num cores assigned to the executor.
This will help in enabling delinkPartition in Smart connector mode as well. This will be tracked with a separate ticket.

c) Fixed couple of issues with partition pruning.
d) Fixed one issue with deleteFrom from IndextTest of the test suite.
e) Removed ClusterSnappyJoinSuite, as both in core & cluster join order optimization should be applied.

Patch testing

precheckin

ReleaseNotes.txt changes

NA

Other PRs

TIBCOSoftware/snappy-spark#95

Handled join order in optimization phase.

Conflicts: core/src/main/scala/org/apache/spark/sql/internal/SnappySessionState.scala

sumwale

Few comments and suggestion to refactor common code. Looks good otherwise.

sumwale · 2018-03-08T14:40:13Z

cluster/src/main/scala/io/snappydata/ToolsCallbackImpl.scala

-      p.numPartitions, p.numBuckets, p.tableBuckets)
+  override def checkHashPartitioning(partitioning: Partitioning): Option[
+      (Seq[Expression], Int, Int)] = partitioning match {
+    case p: HashPartitioning => Some(p.expressions, p.numPartitions, p.numBuckets)
    case _ => None
  }


Same question. What does buckets here change? Since this won't work in smart connector so what is the difference in behaviour?

Yes, In connector mode the scan will happen bucket wise always. When scan starts based on "linkBucketsToPartitions" flag it will determine number of scan partitions. This functionality can also be taken up at optimizer level, so that it works for both connector and embedded mode.

sumwale · 2018-03-08T14:47:46Z

core/src/main/scala/org/apache/spark/sql/internal/SnappySessionState.scala

+        case _ => Nil
+      }
+    }
+


Above two methods look similar to those in HashJoinStrategies (apart from Nil vs Int.MaxValue difference). Can be refactored into a base trait.

Or perhaps once this is applied, then that code in HashJoinStrategies can be removed and substituted by a simpler comparison (or this can determine colocated join here itself and inject a separate "ColocatedJoin" plan which can be resolved appropriately in HashJoinStrategies).

Refactored into a single trait.

sumwale · 2018-03-08T14:56:41Z

core/src/test/scala/org/apache/spark/sql/store/SnappyJoinSuite.scala

+    val withCoalesce = snc.sql(s"select P.OrderRef, P.description from " +
+        s"$t1 P JOIN $t2 R ON P.OrderId = R.OrderId" +
+        s" AND coalesce(P.OrderRef,0) = coalesce(R.OrderRef,0)")
+    // TODO Why an exchange is needed for coalesce.


Good question. It should not.

The semanticEqual() check for an attribute & its Coalesce returns false. We need to override semanticEqual method for some of these functions.

sumwale · 2018-03-08T15:55:51Z

@rishitesh Can you also add a test to check pruning for update/delete after these changes? If they still do not prune then track in a separate ticket.

rishitesh · 2018-03-12T11:09:00Z

@sumwale There are already test for update/select pruning. I have added one for delete as well.

rmishra added 8 commits February 23, 2018 12:39

[SNAP-2225] Removed OrderlessHashPartitioning.

2f22720

Handled join order in optimization phase.

Removed methods.

dc17624

Added more tests.

548bf61

Removed unwanted tests.

1c0219b

test changes

c5330cc

Merge branch 'master' into SNAP-2225

57ba25f

Conflicts: core/src/main/scala/org/apache/spark/sql/internal/SnappySessionState.scala

Handled partition pruning.

52d9f74

minor changes

9cd4765

rishitesh requested a review from sumwale March 1, 2018 07:13

rishitesh assigned kneeraj and unassigned kneeraj Mar 1, 2018

rishitesh requested a review from kneeraj March 1, 2018 07:16

sumwale mentioned this pull request Mar 8, 2018

Different results of nearly identical queries #962

Closed

sumwale reviewed Mar 8, 2018

View reviewed changes

rishitesh added 4 commits March 9, 2018 18:34

Merge branch 'master' into SNAP-2225

7662a81

Refactored keyOrder method to a common utility.

ed1f8db

Added a new utility to give key orders.

f4ec28c

Simple refactoring.

1267772

rishitesh added 8 commits March 12, 2018 16:39

Added a test for table delete partition pruning.

2a1f7be

Handled smart connector partition pruning.

c4a4e8b

Fixed row scan rdd to take correct bucketID.

9f62cde

Fixed hash join preferred locations check.

2c9903e

Fixed the partition column check

ba97e52

Merge branch 'master' into SNAP-2225

85243c9

Fixed JoinQueryPlanning for subset match.

4f411e5

Fixed issue with deleteFrom in IndexTest.

33bc495

rishitesh merged commit a021f6a into master Mar 19, 2018

rishitesh deleted the SNAP-2225 branch March 19, 2018 05:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SNAP-2225] Different results of nearly identical queries , due to join order #971

[SNAP-2225] Different results of nearly identical queries , due to join order #971

rishitesh commented Feb 23, 2018 •

edited

Loading

sumwale left a comment

sumwale Mar 8, 2018

rishitesh Mar 12, 2018

sumwale Mar 8, 2018

rishitesh Mar 12, 2018

sumwale Mar 8, 2018

rishitesh Mar 12, 2018

sumwale commented Mar 8, 2018

rishitesh commented Mar 12, 2018

[SNAP-2225] Different results of nearly identical queries , due to join order #971

[SNAP-2225] Different results of nearly identical queries , due to join order #971

Conversation

rishitesh commented Feb 23, 2018 • edited Loading

Changes proposed in this pull request

Patch testing

ReleaseNotes.txt changes

Other PRs

sumwale left a comment

Choose a reason for hiding this comment

sumwale Mar 8, 2018

Choose a reason for hiding this comment

rishitesh Mar 12, 2018

Choose a reason for hiding this comment

sumwale Mar 8, 2018

Choose a reason for hiding this comment

rishitesh Mar 12, 2018

Choose a reason for hiding this comment

sumwale Mar 8, 2018

Choose a reason for hiding this comment

rishitesh Mar 12, 2018

Choose a reason for hiding this comment

sumwale commented Mar 8, 2018

rishitesh commented Mar 12, 2018

rishitesh commented Feb 23, 2018 •

edited

Loading