Skip to content

ordinal regression model type & polr engine #6

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 14 commits into from
Apr 21, 2025

Conversation

corybrunson
Copy link
Owner

@corybrunson corybrunson commented Nov 4, 2024

This PR addresses #4 by introducing a single model type for ordinal regression and a single deployable engine. My thinking is that we should complete the implementation of one engine before beginning another.

Model type

The model type is ordinal_reg(), per this suggestion. However, as noted in the NEWS, this could be replaced with separate ordinal_*() types for different model structures, per this suggestion.

Engine

The model type comes with one engine, 'polr', which invokes MASS::polr(). The engine has one tuning parameter, called ordinal_link, which mimics survival_link and passed to the method parameter of polr(). The engine also provides class and prob prediction formats; confidence intervals for predictions seem not to be implemented in {MASS}. The engine is registered on load.

The ordinal_reg branch of {ordered} is coordinated with cognominal branches of {parsnip} and of {dials}. In {parsnip}, the model type is registered on load, a basic update() method is provided, and several other brief files or code chunks analogous to those for other model types are included. In {dials}, the ordinal_link parameter tuner is defined.

NB: I am not sure i successfully synchronized ordinal_link to method; in particular, the polr_engine_args tibble is a bit mysterious to me. A unit test with hyperparameter optimization needs to be written. Edit: See the example in a comment below.

Documentation

Package documentation was added to 'ordered-package.R' so that illustrative examples, including of {ordinalForest}, could be included there.

NB: I was unable to install the necessary dependencies to knit 'aaa.Rmd', so i manually wrote 'ordinal_reg_polr.md'.

@corybrunson
Copy link
Owner Author

Here is a complete analysis using the housing data from {MASS}. Note that all three fork branches must be installed, not just {ordered}. The data are disaggregated for this illustration but are a good use case for frequency-informed sampling/partitioning (without having to disaggregate).

library(tidymodels)
library(ordered)

# disaggregated data & partition
house_data <-
  MASS::housing[rep(seq(nrow(MASS::housing)), MASS::housing$Freq), -5]
house_split <- initial_split(house_data, prop = .8)
house_train <- training(house_split)
house_test <- testing(house_split)

# tunable model & analysis specification
house_rec <- recipe(Sat ~ Infl + Type + Cont, data = house_train)
house_spec <- ordinal_reg() |>
  set_engine("polr") |>
  set_args(method = tune())
house_tune <- extract_parameter_set_dials(house_spec)
(house_grid <- grid_regular(house_tune, levels = Inf))
#> # A tibble: 5 × 1
#>   method  
#>   <chr>   
#> 1 logistic
#> 2 probit  
#> 3 loglog  
#> 4 cloglog 
#> 5 cauchit

# hyperparameter (link function) optimization
house_res <- tune_grid(
  house_spec,
  preprocessor = house_rec,
  resamples = vfold_cv(house_train),
  grid = house_grid,
  metrics = metric_set(accuracy, roc_auc)
)
(house_link <- select_best(house_res, metric = "accuracy"))
#> # A tibble: 1 × 2
#>   method   .config             
#>   <chr>    <chr>               
#> 1 logistic Preprocessor1_Model1

# final fit
house_prep <- prep(house_rec)
house_final <- finalize_model(house_spec, house_link)
(house_fit <- fit(house_final, formula(house_prep), data = house_train))
#> parsnip model object
#> 
#> Call:
#> MASS::polr(formula = Sat ~ Infl + Type + Cont, data = data, method = ~"logistic")
#> 
#> Coefficients:
#>    InflMedium      InflHigh TypeApartment    TypeAtrium   TypeTerrace 
#>     0.5103368     1.2315652    -0.4973120    -0.2740917    -0.9533085 
#>      ContHigh 
#>     0.3576051 
#> 
#> Intercepts:
#>  Low|Medium Medium|High 
#>  -0.4677984   0.7202062 
#> 
#> Residual Deviance: 2803.47 
#> AIC: 2819.47

# evaluation
house_pred_class <- predict(house_fit, new_data = house_test, type = "class")
bind_cols(house_test, house_pred_class) |>
  accuracy(truth = Sat, estimate = .pred_class)
#> # A tibble: 1 × 3
#>   .metric  .estimator .estimate
#>   <chr>    <chr>          <dbl>
#> 1 accuracy multiclass     0.528
house_pred_prob <- predict(house_fit, new_data = house_test, type = "prob")
bind_cols(house_test, house_pred_prob) |>
  roc_auc(truth = Sat, starts_with(".pred_"))
#> # A tibble: 1 × 3
#>   .metric .estimator .estimate
#>   <chr>   <chr>          <dbl>
#> 1 roc_auc hand_till      0.652

Created on 2024-11-04 with reprex v2.1.1

@topepo
Copy link
Collaborator

topepo commented Nov 6, 2024

I'll try to review this later today. My first thought is that the bare skeleton of ordinal_reg() should live in parsnip so that they can use our "enhanced" engine documentation.

@corybrunson
Copy link
Owner Author

@topepo could this be resumed for a minimal CRAN submission in the next several months? I will join a project in June and hope to make use of this package. : )

@corybrunson corybrunson changed the base branch from main to ordinal_reg April 21, 2025 18:56
@mattwarkentin mattwarkentin merged commit 8fc08fc into corybrunson:ordinal_reg Apr 21, 2025
0 of 12 checks passed
@corybrunson corybrunson deleted the ordinal_reg branch April 21, 2025 19:19
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants