Skip to content

predict() on a mlp with nnet double names the output with .pred_ #174

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
mouli3c3 opened this issue May 1, 2019 · 4 comments · Fixed by #225
Closed

predict() on a mlp with nnet double names the output with .pred_ #174

mouli3c3 opened this issue May 1, 2019 · 4 comments · Fixed by #225
Labels
bug an unexpected problem or unintended behavior

Comments

@mouli3c3
Copy link

mouli3c3 commented May 1, 2019

This problem is similar to an already closed issue(#107) but with mlp using nnet.


library(tidymodels)
#> -- Attaching packages ------------------------------------------------- tidymodels 0.0.2 --
#> v broom     0.5.1       v purrr     0.3.2  
#> v dials     0.0.2       v recipes   0.1.5  
#> v dplyr     0.8.0.1     v rsample   0.0.4  
#> v ggplot2   3.1.0       v tibble    2.1.1  
#> v infer     0.4.0       v yardstick 0.0.3  
#> v parsnip   0.0.2
#> -- Conflicts ---------------------------------------------------- tidymodels_conflicts() --
#> x purrr::discard() masks scales::discard()
#> x dplyr::filter()  masks stats::filter()
#> x dplyr::lag()     masks stats::lag()
#> x recipes::step()  masks stats::step()
data(credit_data)

set.seed(7075)
data_split <- initial_split(credit_data, strata = "Status", p = 0.75)

credit_train <- training(data_split)
credit_test  <- testing(data_split)
credit_rec <- 
  recipe(Status ~ ., data = credit_train) %>%
  step_knnimpute(Home, Job, Marital, Income, Assets, Debt) %>%
  step_dummy(all_nominal(), -Status) %>%
  step_center(all_predictors()) %>%
  step_scale(all_predictors()) %>%
  prep(training = credit_train, retain = TRUE)

test_normalized <- bake(credit_rec, new_data = credit_test, all_predictors())

set.seed(57974)
nnet_fit <-set_engine(mlp("classification",hidden_units =10),"nnet") %>%
  fit(Status ~ ., data = juice(credit_rec))

glm_fit <- set_engine(logistic_reg(),"glm") %>% 
  fit(Status ~ ., data = juice(credit_rec))

#Issue with predict on nnet
glimpse(predict(nnet_fit, new_data = test_normalized, type = "prob"))
#> Observations: 1,113
#> Variables: 2
#> $ .pred_.pred_bad  <dbl> 0.5608545, 0.7023505, 0.3303682, 0.4221877, 0...
#> $ .pred_.pred_good <dbl> 0.4391455, 0.2976495, 0.6696318, 0.5778123, 0...

#Normal with predict on glm (No issue)
glimpse(predict(glm_fit, new_data = test_normalized, type = "prob"))
#> Observations: 1,113
#> Variables: 2
#> $ .pred_bad  <dbl> 0.04675355, 0.94317298, 0.24316454, 0.06970005, 0.0...
#> $ .pred_good <dbl> 0.95324645, 0.05682702, 0.75683546, 0.93029995, 0.9...
@topepo topepo added the bug an unexpected problem or unintended behavior label May 1, 2019
@topepo
Copy link
Member

topepo commented May 1, 2019

SIDM (same issue, different model)

@patr1ckm
Copy link
Contributor

patr1ckm commented Oct 29, 2019

I can confirm that this can be closed. Running

data(credit_data)
nnet_fit <-set_engine(mlp("classification",hidden_units =10),"nnet") %>%
  fit(Status ~ ., data = credit_data)

glm_fit <- set_engine(logistic_reg(),"glm") %>% 
  fit(Status ~ ., data = credit_data)

Produces:

> glimpse(predict(nnet_fit, new_data = credit_data, type = "prob"))
Observations: 4,454
Variables: 2
$ .pred_V1 <dbl> 0.3419620, 0.3419620, 0.3392285, 0.3387520, 0.4335137, 0.2995662, 0.2995662, 0.3010878, 0.4102205, 0.5224852, 0.33…
$ .pred_V2 <dbl> 0.6580380, 0.6580380, 0.6607715, 0.6612480, 0.5664863, 0.7004338, 0.7004338, 0.6989122, 0.5897795, 0.4775148, 0.66…

> glimpse(predict(glm_fit, new_data = credit_data, type = "prob"))
Observations: 4,454
Variables: 2
$ .pred_bad  <dbl> 0.24860098, 0.11323173, 0.56131606, 0.21922027, 0.14454134, 0.03888827, 0.04857814, 0.03515797, 0.23389520, 0.80…
$ .pred_good <dbl> 0.75139902, 0.88676827, 0.43868394, 0.78077973, 0.85545866, 0.96111173, 0.95142186, 0.96484203, 0.76610480, 0.19…

However, note the columns are named differently depending on the model type in this case.

@mouli3c3
Copy link
Author

I see that nnet_fit$lvl and glm_fit$lvl both indicate target levels as "bad" "good". I'm not sure if it is intended behavior to see predict producing different column names for different models!!

@github-actions
Copy link

github-actions bot commented Mar 8, 2021

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 8, 2021
# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
3 participants