Skip to content

Feature Request: ability to pass dataframe to validation argument of xgboost #765

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
joeycouse opened this issue Jul 12, 2022 · 2 comments
Labels
feature a feature request or enhancement

Comments

@joeycouse
Copy link

Related to #760

Current implementation of the validation parameter in boost_tree is to only set the proportion of training data to use as the validation set. It would be great to have the ability to pass a dataframe as an argument to validation as well. This would be really helpful if there is a grouping structure within the data and you want to test if the model generalizes to difference groups, and would align the parsnip capabilities to match xbg.train()

Not a great example but just for demonstration

library(modeldata)

data("penguins")

train <- 
  penguins |> 
  filter(species %in% c("Gentoo", "Adelie"))

valid <-
  penguins |> 
  filter(!(species %in% c("Gentoo", "Adelie")))


boost_tree(mode = 'regression',
           mtry = 3,
           tree_depth = 2,
           stop_iter = 5) |> 
  set_engine(validation = valid)

@simonpcouch
Copy link
Contributor

Thanks for the issue! This is an interesting idea and one that we ought to consider. xgboost and lightgbm's interfaces for validation sets allow for a lot of user control, but we'd need to think carefully about what a tidymodels-esque interface might feel like here.

This won't be on the top of our to-do list for now, but will leave this open as a possible future extension. :)

@simonpcouch
Copy link
Contributor

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

2 participants