-
Notifications
You must be signed in to change notification settings - Fork 72
Swap out patsy
for formulae
#463
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
base: main
Are you sure you want to change the base?
Conversation
Cool. Thanks @ksolarski, just a quick reply from my phone... Don't do this for the synthetic control because I have an in progress PR that will change it. It won't have a formula input. But can I just get some clarification... does this change the API? Can we get the exact same functionality? If not, let's think again. Will try to look at the code properly when I can 👍🏻 |
I can't find where I saw it in the I'm not 100% sure that this is a problem, and apologies I can't find the relevant part in the docs. But does my concern make sense? |
You're right, Patsy has the power of preserving the transformation / encoding of variables through However, Patsy repo suggests migration to https://github.com/matthewwardrop/formulaic instead, which is capable of "reusing the encoding choices made during conversion of one data-set on other datasets." (see https://matthewwardrop.github.io/formulaic/latest/). There's also a migration guide from Patsy to Formulaic to switch would be easy. It also supports many operators: https://matthewwardrop.github.io/formulaic/latest/guides/grammar/ Did you check out this library before? What do you think about using this instead of formulae? |
@drbenvincent any strong opinions about using |
Sorry for the delayed response @ksolarski. So as far as I understand, Right now there are no use-cases for hierarchical modelling. That might change in the future, though I don't have any specific use cases in mind. So I guess the only choice at the moment is |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #463 +/- ##
==========================================
- Coverage 94.66% 94.66% -0.01%
==========================================
Files 32 32
Lines 2195 2194 -1
==========================================
- Hits 2078 2077 -1
Misses 117 117 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
@drbenvincent Yes, from the docs it seems that no hierarchical models are allowed in the import pandas as pd
from formulaic import model_matrix
import formulaic
# Create a training dataset
train_data = pd.DataFrame(
{
"feature1": ["A", "B", "C", "D"],
"target": [0, 1, 0, 1],
}
)
# Create a test dataset
test_data = pd.DataFrame(
{
"feature1": [
"A", # In training
"D", # In training
"E", # Not in training
],
"target": [0, 1, 0],
}
)
# Generate the model matrix for the training data
train_matrix = model_matrix("target ~ 0 + feature1", train_data)
# Print the training matrix and spec
print("Training Matrix:")
print(train_matrix)
# Use the same spec to transform the test data
test_matrix = model_matrix(spec=train_matrix.model_spec, data=test_data)
# Print the test matrix - see that columns are properly aligned from the training data transformation
print("\nTest Matrix:")
print(test_matrix) Is that the problem you had in mind or something else? |
Solving issue #386
Starting with DiD, will continue with other methods if you with general design @drbenvincent
Seems like the key practical difference between
formulae
andpatsy
is lack ofbuild_design_matrices
method informulae
. User has to then provide formula again.📚 Documentation preview 📚: https://causalpy--463.org.readthedocs.build/en/463/