Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Simplify code for linear transformers #332

Open
mattansb opened this issue Dec 12, 2022 · 1 comment
Open

Simplify code for linear transformers #332

mattansb opened this issue Dec 12, 2022 · 1 comment
Assignees

Comments

@mattansb
Copy link
Member

mattansb commented Dec 12, 2022

All of the following are linear transformers:

  • center()
  • standardize()
  • slide()
  • reverse()
  • rescale()
  • normalize()

So hypothetically we should be able to simplify their code to a common underlying function. Idially, each function would compute the needed shift and scale values, and these will be passed to standardize(..., center = <shift>, scale = <scale>).
This will have 0 effect on usability, but will make maintenance of the code much easier.

Function Supports weights? Shift Scale
center() mean(x) or median(x) 1
standardize() mean(x) or median(x) sd(x) or mad(x)
slide() min(x) - lowest 1
reverse() -(max(x)+min(x)) -1
rescale() min(x)*<scale>-to[1] (to[2]-to[1])/(max(x)-min(x))
normalize() same as rescale(x,to=c(0,1)+c(-1,1)*include_bounds)
where include_bounds is a number [0,1] or 0.5/length(x)

In any place where a reference= arg is provided, replace x with reference above.


All functions would have to deal with missing values, non-finite values, and labels.

@strengejacke
Copy link
Member

I looked at the code, and I'm not sure if there's really much we can simplify. Due to the possible transformation inside formulas, we have the dw_transformer class, which requires different attributes (depending on the transformation function), so we need these information anyway, and can't "standardize" this code. Furthermore, there are a few exceptions that we must handle anyway, e.g. reversing for factors (where min()/max() don't work).

You can try to improve things here, but my first impression is that not much simplification is possible beyond what we already did.

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

2 participants