How the negative_sampling function is implemented? #3623

chunyuma · 2021-12-03T01:51:32Z

chunyuma
Dec 3, 2021

Hi @rusty1s, I'm doing a link prediction referring to your rgcn_link_pred.py example. I'm a little bit confused about how the negative sampling implemented should be implemented. In this example, it provides the data.train_edge_index to the negative_sampling function (please see here) but this might sample some false negative sample because it might generate some negative edges which are actually in the data.valid_edge_index or data.test_edge_index. I'm wondering if we need to consider all positive edges in all train, valid and test indexes in order to reduce the false negative samples.

In addition, according to the negative_sampling function used in this sample, it might even sample the negative edges which are already in the data.train_edge_index itself. For example, if the original train_edge_index is [[5,5],[3,4]], the neg_edge_index might be [[4,5],[3,3]] where edge 5->3 is positive edge. I found that pytorch geometric.utils has a function for negative sample which might generate the new edges which are not exist in
positive edges in the edge_index, right?

Thank you!

Answered by rusty1s

Dec 3, 2021

Hi, and thanks for your interest. You are right that negative_sampling might sample positive edges contained in the validation and test set. However, we are not allowed to acknowledge their existence during training to prevent any data leakage. Although this seems to be counter-intuitive at first glance, it is to be expected that such a noise in the learning signal is averaged out during optimization.

Related to your second issue: Yes, this is a trade-off regarding runtime efficiency and correctness. In general, we do not see a decrease in performance when sampling a positive edge as negative during training once in a while.

View full answer

rusty1s · 2021-12-03T12:43:37Z

rusty1s
Dec 3, 2021
Maintainer

Hi, and thanks for your interest. You are right that negative_sampling might sample positive edges contained in the validation and test set. However, we are not allowed to acknowledge their existence during training to prevent any data leakage. Although this seems to be counter-intuitive at first glance, it is to be expected that such a noise in the learning signal is averaged out during optimization.

Related to your second issue: Yes, this is a trade-off regarding runtime efficiency and correctness. In general, we do not see a decrease in performance when sampling a positive edge as negative during training once in a while.

2 replies

chunyuma Dec 4, 2021
Author

Ah, I see! Thanks for your answer @rusty1s!

colorlace Sep 28, 2024

Wow I've been trying to figure out why the negative sampler doesn't actually guarantee non-existent edges for hours. (for DGL albeit, but I assume the logic is the same). Thanks for this old convo!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How the negative_sampling function is implemented? #3623

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

How the negative_sampling function is implemented? #3623

chunyuma Dec 3, 2021

Replies: 1 comment · 2 replies

rusty1s Dec 3, 2021 Maintainer

chunyuma Dec 4, 2021 Author

colorlace Sep 28, 2024

chunyuma
Dec 3, 2021

Replies: 1 comment 2 replies

rusty1s
Dec 3, 2021
Maintainer

chunyuma Dec 4, 2021
Author