Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Why identify_collinear does't consider statistical importance of Pearson coeff? #39

Open
EugeniaKoKo opened this issue Mar 11, 2020 · 0 comments

Comments

@EugeniaKoKo
Copy link

In method identify_collinear I discovered, that you do not respect pvalue of Pearson coefficient.
That is, one can remove features, which correlation have nor statistical importance \

It can be done simply by adding pvalue-check for each identified correlation:

from scipy import stats
pvalue = stats.pearsonr(data[feat1], data[feat2])[1]
if pvalue < 0.05 ... 

One can also add threshold for statistical significance and set 0.01 instead of 0.05

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant