Skip to content

Fix mrmr working with categoricals #1311

Merged
merged 4 commits into from
Jul 10, 2023
Merged

Fix mrmr working with categoricals #1311

merged 4 commits into from
Jul 10, 2023

Conversation

alex-hse-repository
Copy link
Collaborator

Before submitting (must do checklist)

  • Did you read the contribution guide?
  • Did you update the docs? We use Numpy format for all the methods and classes.
  • Did you write any new necessary tests?
  • Did you update the CHANGELOG?

Proposed Changes

  • Cast all categoricals to floats during redundancy computation in mrmr.

Closing issues

closes #1310

@codecov-commenter
Copy link

codecov-commenter commented Jul 10, 2023

Codecov Report

Merging #1311 (2911e30) into master (e486a24) will decrease coverage by 32.17%.
The diff coverage is 0.00%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

@@             Coverage Diff             @@
##           master    #1311       +/-   ##
===========================================
- Coverage   88.20%   56.03%   -32.17%     
===========================================
  Files         186      197       +11     
  Lines       11807    12126      +319     
===========================================
- Hits        10414     6795     -3619     
- Misses       1393     5331     +3938     
Impacted Files Coverage Δ
etna/analysis/feature_selection/mrmr_selection.py 26.66% <0.00%> (-73.34%) ⬇️

... and 134 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@github-actions
Copy link

github-actions bot commented Jul 10, 2023

@github-actions github-actions bot temporarily deployed to pull request July 10, 2023 11:49 Inactive
@@ -82,6 +82,14 @@ def mrmr(
redundancy_table = pd.DataFrame(np.inf, index=all_features, columns=all_features)
top_k = min(top_k, len(all_features))

# can't compute correlation of categorical column with the others
cat_cols = regressors.dtypes[regressors.dtypes == "category"].index
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't it be easier to cast all regressors to float?

try:
regressors[cat_col] = regressors[cat_col].astype(float)
except ValueError:
raise ValueError(f"{cat_col} column cannot be cast to float type! Please, use encoders.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should test this error.

@github-actions github-actions bot temporarily deployed to pull request July 10, 2023 13:05 Inactive
@@ -32,6 +33,12 @@ def df_with_regressors() -> Dict[str, pd.DataFrame]:
regressor = df_regressors_useless[df_regressors_useless["segment"] == segment]["target"].values
df_exog[f"regressor_useless_{i}"] = regressor

# useless categorical regressor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regressor -> regressors?



@pytest.mark.parametrize("fast_redundancy", [True, False])
def test_mrmr_with_categorical_regressor(df_with_regressors, fast_redundancy):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_mrmr_with_categorical_regressor -> test_mrmr_with_castable_categorical_regressor?

@github-actions github-actions bot temporarily deployed to pull request July 10, 2023 14:01 Inactive
@Mr-Geekman Mr-Geekman merged commit 0225a51 into master Jul 10, 2023
@Mr-Geekman Mr-Geekman deleted the issue-1310 branch July 10, 2023 15:12
# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Error on using MRMRFeatureSelectionTransform with fast_redundancy=False
3 participants