add variables and special model formulas

With upcoming hierarchical models, GAMS, and others, we need to make the workflow interface smoother. 

Currently, it is not intuitive in a few ways: 

 * There are the usual formula and the model formula (that you get via `add_model()`). 
 * The default formula processing can eliminate the specific variables that you want to keep intact. 

Historically, the model formula has always done many things: specify the variables in the model, create encodings for them, and then hand them off to the model with the appropriate analysis roles (e.g. outcome, predictor, etc). 

## Example

For example, if there was a `parsnip` hierarchical model to fit via stan or `lme4`, a user's initial stab would be: 

```r
library(tidymodels)

data(sleepstudy, package = "lme4")

mod <- linear_reg() %>% set_engine("stan glmer")

wflow_0 <- 
  workflow() %>% 
  # Won't work since the basic formula method makes dummy variables
  add_formula(Reaction ~ Days + (Days || Subject)) %>% 
  add_model(mod)
````

`fit()` will generate the error: 

```
Error in Days || Subject : invalid 'y' type in 'x || y'
```

(which could be better)


Looking around, the `formula` argument to `add_model()` is found: 

```r
wflow_1 <- 
  workflow() %>% 
  # Make a simple formula for processing the data 
  add_formula(Reaction ~ Days + Subject) %>% 
  # Then add another formula to give to the model: 
  add_model(mod, formula = Reaction ~ Days + (Days || Subject))
````

That ends in an error of 

```
Error in eval(predvars, data, env) : object 'Subject' not found 
``` 
because `add_formula()` makes dummy variables. 

## Current solution

After searching a lot more, there are two options that are kludgy but work:

```r
bp <- hardhat::default_formula_blueprint(indicators = FALSE)
wflow_2 <- 
  workflow() %>% 
  add_formula(Reaction ~ ., blueprint = bp) %>% 
  add_model(mod, formula = Reaction ~ Days + (Days || Subject))

wflow_3 <- 
  workflow() %>% 
  add_recipe(recipe(Reaction ~ ., data = sleepstudy)) %>% 
  add_model(mod, formula = Reaction ~ Days + (Days || Subject))
```

We can make this interface a lot better and intuitive.

## Proposals

Some straw-man proposals:

First, let's make a function where users can tell the model what data to use, and maybe their limited roles, without doing any pre-processing: 

```r
wflow_4 <- 
  workflow() %>% 
  # Add in the data by processing through only `model.frame()` or equivalent. 
  # No other in-line functions used; just as-is:
  add_variables_asis(Reaction ~ .) %>% 
  add_model(mod, formula = Reaction ~ Days + (Days || Subject))
```

Having two formulas might be confusing. Basic `tidyselect` tools could be used instead: 

```r
wflow_5 <- 
  workflow() %>% 
  # If formulas are confusing, we could use tidyselect functions
  add_variables(one_of(Reaction, Days, Subject)) %>% 
  add_model(mod, formula = Reaction ~ Days + (Days || Subject))
```

Even though the endpoint could be achieved using current code, the existing methods are not intuitive and also not well documented in `workflows`. 

Second, even though the model formula is tied to the model, it might be better to have a separate _add_ function that attaches a model formula to a model specification: 

```r
wflow_6 <- 
  workflow() %>% 
  add_variables(one_of(Reaction, Days, Subject)) %>% 
  add_model(mod) %>% 
  add_model_formula(Reaction ~ Days + (Days || Subject))
```

A few people might want to add input: @jaredlander, @beckmart, @monicathieu, @billdenney, @emitanaka, and @Athanasiamo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add variables and special model formulas #34

Example

Current solution

Proposals

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

add variables and special model formulas #34

Description

Example

Current solution

Proposals

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions