Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[SPARK-35357][GRAPHX] Allow to turn off the normalization applied by static PageRank utilities #32485

Closed

Conversation

ebonnal
Copy link
Contributor

@ebonnal ebonnal commented May 9, 2021

What changes were proposed in this pull request?

Overload methods PageRank.runWithOptions and PageRank.runWithOptionsWithPreviousPageRank (not to break any user-facing signature) with a normalized parameter that describes "whether or not to normalize the rank sum".

Why are the changes needed?

https://issues.apache.org/jira/browse/SPARK-35357

When dealing with a non negligible proportion of sinks in a graph, algorithm based on incremental update of ranks can get a precision gain for free if they are allowed to manipulate non normalized ranks.

Does this PR introduce any user-facing change?

No

How was this patch tested?

By adding a unit test that verifies that (even when dealing with a graph containing a sink) we end up with the same result for both these scenarios:
a)

  • Run 6 iterations of pagerank in a row using PageRank.runWithOptions with normalization enabled

b)

  • Run 2 iterations using PageRank.runWithOptions with normalization disabled
  • Resume from the preRankGraph1 and run 2 more iterations using PageRank.runWithOptionsWithPreviousPageRank with normalization disabled
  • Finally resume from the preRankGraph2 and run 2 more iterations using PageRank.runWithOptionsWithPreviousPageRank with normalization enabled

…nk with a 'normalized' parameter to trigger or not the normalization
@ebonnal ebonnal changed the title [WIP][GRAPHX] Allow to turn off the normalization applied in the end of static PageRank utilities [WIP][GRAPHX] Allow to turn off the normalization applied by static PageRank utilities May 9, 2021
@ebonnal ebonnal marked this pull request as ready for review May 9, 2021 17:48
@ebonnal ebonnal changed the title [WIP][GRAPHX] Allow to turn off the normalization applied by static PageRank utilities [SPARK-35357][GRAPHX] Allow to turn off the normalization applied by static PageRank utilities May 9, 2021
@HyukjinKwon
Copy link
Member

ok to test

@SparkQA
Copy link

SparkQA commented May 10, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42856/

@SparkQA
Copy link

SparkQA commented May 10, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42856/

@SparkQA
Copy link

SparkQA commented May 10, 2021

Test build #138334 has finished for PR 32485 at commit 60482b3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

I think it's fine. cc @srowen FYI

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK, only one tiny comment about 'since'

@SparkQA
Copy link

SparkQA commented May 11, 2021

Test build #138375 has finished for PR 32485 at commit 5a52408.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ebonnal
Copy link
Contributor Author

ebonnal commented May 11, 2021

Thank you @Ayushsunny @HyukjinKwon @srowen for the review 🙏 .
I have applied the requested changes.

@SparkQA
Copy link

SparkQA commented May 11, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42898/

@SparkQA
Copy link

SparkQA commented May 11, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42898/

@ebonnal ebonnal requested a review from srowen May 11, 2021 14:18
@srowen srowen closed this in 402375b May 12, 2021
@srowen
Copy link
Member

srowen commented May 12, 2021

Merged to master

@ebonnal ebonnal deleted the make-pagerank-normalization-optional branch May 12, 2021 14:41
@ebonnal ebonnal restored the make-pagerank-normalization-optional branch May 12, 2021 14:41
@ebonnal ebonnal deleted the make-pagerank-normalization-optional branch May 12, 2021 14:41
@ebonnal ebonnal restored the make-pagerank-normalization-optional branch May 12, 2021 14:44
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants