Skip to content

Feature preprocessors, Loss strategies #86

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged

Conversation

ravinkohli
Copy link
Contributor

Added the following components from old autopytorch code:

  1. Feature Preprocessors like Kitchen sinks, FastICA, KernelPCA, RandomKitchenSinks, Nystroem, PolynomialFeatures, PowerTransformer, TruncatedSVD
  2. Weighted Loss strategies for binary and multiclass classification

@ravinkohli ravinkohli changed the title ADD Weighted loss Feature preprocessors, Loss strategies Feb 5, 2021
# add only child hyperparameters of early_preprocessor choices
for name in preprocessor.choices:
updates = self._get_search_space_updates(prefix=name)
config_space = available_[name].get_hyperparameter_search_space(dataset_properties, # type:ignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why type ignore here?

Copy link
Contributor Author

@ravinkohli ravinkohli Feb 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so as get_hyperparameter_search_space of the base component only has dataset properties as a parameter, it gives a "too many arguments for get_hyperparameter search space" error. If I add a **kwargs to the base component, it gives signature invalid for every class inheriting from base component. On searching on google, I found this, where they are suggesting to use # type: ignore

@franchuterivera franchuterivera merged commit d33388b into automl:refactor_development Feb 9, 2021
nabenabe0928 pushed a commit to nabenabe0928/Auto-PyTorch that referenced this pull request Feb 19, 2021
* ADD Weighted loss

* Now?

* Fix tests, flake, mypy

* Fix tests

* Fix mypy

* change back sklearn requirement

* Assert for fast ica sklearn bug

* Forgot to add skip

* Fix tests, changed num only data to float

* removed fast ica

* change num only dataset

* Increased number of features in num only

* Increase timeout for pytest

* ADD tensorboard to requirement

* Fix bug with small_preprocess

* Fix bug in pytest execution

* Fix tests

* ADD error is raised if default not in include

* Added dynamic search space for deciding n components in feature preprocessors, add test for pipeline include

* Moved back to random configs in tabular test

* Added floor and ceil and handling of logs

* Fix flake

* Remove TruncatedSVD from cs if num numerical ==1

* ADD flakyness to network accuracy test

* fix flake

* remove cla to pytest
franchuterivera added a commit to franchuterivera/Auto-PyTorch that referenced this pull request Jun 14, 2021
* New refactor code. Initial push

* Allow specifying the network type in include (automl#78)

* Allow specifying the network type in include

* Fix test flake 8

* fix test api

* increased time for func eval in cros validation

* Addressed comments

Co-authored-by: Ravin Kohli <kohliravin7@gmail.com>

* Search space update (automl#80)

* Added Hyperparameter Search space updates

* added test for search space update

* Added Hyperparameter Search space updates

* added test for search space update

* Added hyperparameter search space updates to network, trainer and improved check for search space updates

* Fix mypy, flake8

* Fix tests and silly mistake in base_pipeline

* Fix flake

* added _cs_updates to dummy component

* fixed indentation and isinstance comment

* fixed silly error

* Addressed comments from fransisco

* added value error for search space updates

* ADD tests for setting range of config space

* fic utils search space update

* Make sure the performance of pipeline is at least 0.8

* Early stop fixes

* Network Cleanup (automl#81)

* removed old supported_tasks dictionary from heads, added some docstrings and some small fixes

* removed old supported_tasks attribute and updated doc strings in base backbone and base head components

* removed old supported_tasks attribute from network backbones

* put time series backbones in separate files, add doc strings and refactored search space arguments

* split image networks into separate files, add doc strings and refactor search space

* fix typo

* add an intial simple backbone test similar to the network head test

* fix flake8

* fixed imports in backbones and heads

* added new network backbone and head tests

* enabled tests for adding custom backbones and heads, added required properties to base head and base backbone

* First documentation

* Default to ubuntu-18.04

* Comment enhancements

* Feature preprocessors, Loss strategies (automl#86)

* ADD Weighted loss

* Now?

* Fix tests, flake, mypy

* Fix tests

* Fix mypy

* change back sklearn requirement

* Assert for fast ica sklearn bug

* Forgot to add skip

* Fix tests, changed num only data to float

* removed fast ica

* change num only dataset

* Increased number of features in num only

* Increase timeout for pytest

* ADD tensorboard to requirement

* Fix bug with small_preprocess

* Fix bug in pytest execution

* Fix tests

* ADD error is raised if default not in include

* Added dynamic search space for deciding n components in feature preprocessors, add test for pipeline include

* Moved back to random configs in tabular test

* Added floor and ceil and handling of logs

* Fix flake

* Remove TruncatedSVD from cs if num numerical ==1

* ADD flakyness to network accuracy test

* fix flake

* remove cla to pytest

* Validate the input to autopytorch

* Bug fixes after rebase

* Move to new scikit learn

* Remove dangerous convert dtype

* Try to remove random float error again and make data pickable

* Tets pickle on versions higher than 3.6

* Tets pickle on versions higher than 3.6

* Comment fixes

* Adding tabular regression pipeline (automl#85)

* removed old supported_tasks dictionary from heads, added some docstrings and some small fixes

* removed old supported_tasks attribute and updated doc strings in base backbone and base head components

* removed old supported_tasks attribute from network backbones

* put time series backbones in separate files, add doc strings and refactored search space arguments

* split image networks into separate files, add doc strings and refactor search space

* fix typo

* add an intial simple backbone test similar to the network head test

* fix flake8

* fixed imports in backbones and heads

* added new network backbone and head tests

* enabled tests for adding custom backbones and heads, added required properties to base head and base backbone

* adding tabular regression pipeline

* fix flake8

* adding tabular regression pipeline

* fix flake8

* fix regression test

* fix indentation and comments, undo change in base network

* pipeline fitting tests now check the expected output shape dynamically based on the input data

* refactored trainer tests, added trainer test for regression

* remove regression from mixup unitest

* use pandas unique instead of numpy

* [IMPORTANT] added proper target casting based on task type to base trainer

* adding tabular regression task to api

* adding tabular regression example, some small fixes

* new/more tests for tabular regression

* fix mypy and flake8 errors from merge

* fix issues with new weighted loss and regression tasks

* change tabular column transformer to use net fit_dictionary_tabular fixture

* fixing tests, replaced num_classes with output_shape

* fixes after merge

* adding voting regressor wrapper

* fix mypy and flake

* updated example

* lower r2 target

* address comments

* increasing timeout

* increase number of labels in test_losses because it occasionally failed if one class was not in the labels

* lower regression lr in score test until seeding properly works

* fix randomization in feature validator test

* Make sure the performance of pipeline is at least 0.8

* Early stop fixes

* Network Cleanup (automl#81)

* removed old supported_tasks dictionary from heads, added some docstrings and some small fixes

* removed old supported_tasks attribute and updated doc strings in base backbone and base head components

* removed old supported_tasks attribute from network backbones

* put time series backbones in separate files, add doc strings and refactored search space arguments

* split image networks into separate files, add doc strings and refactor search space

* fix typo

* add an intial simple backbone test similar to the network head test

* fix flake8

* fixed imports in backbones and heads

* added new network backbone and head tests

* enabled tests for adding custom backbones and heads, added required properties to base head and base backbone

* First documentation

* Default to ubuntu-18.04

* Comment enhancements

* Feature preprocessors, Loss strategies (automl#86)

* ADD Weighted loss

* Now?

* Fix tests, flake, mypy

* Fix tests

* Fix mypy

* change back sklearn requirement

* Assert for fast ica sklearn bug

* Forgot to add skip

* Fix tests, changed num only data to float

* removed fast ica

* change num only dataset

* Increased number of features in num only

* Increase timeout for pytest

* ADD tensorboard to requirement

* Fix bug with small_preprocess

* Fix bug in pytest execution

* Fix tests

* ADD error is raised if default not in include

* Added dynamic search space for deciding n components in feature preprocessors, add test for pipeline include

* Moved back to random configs in tabular test

* Added floor and ceil and handling of logs

* Fix flake

* Remove TruncatedSVD from cs if num numerical ==1

* ADD flakyness to network accuracy test

* fix flake

* remove cla to pytest

* Validate the input to autopytorch

* Bug fixes after rebase

* Move to new scikit learn

* Remove dangerous convert dtype

* Try to remove random float error again and make data pickable

* Tets pickle on versions higher than 3.6

* Tets pickle on versions higher than 3.6

* Comment fixes

* [REFACTORING]: no change in the functionalities, inputs, returns

* Modified an error message

* [Test error fix]: Fixed the error caused by flake8

* [Test error fix]: Fixed the error caused by flake8

* FIX weighted loss issue (automl#94)

* Changed tests for losses and how weighted strategy is handled in the base trainer

* Addressed comments from francisco

* Fix training test

* Re-arranged tests and moved test_setup to pytest

* Reduced search space for dummy forward backward pass of backbones

* Fix typo

* ADD Doc string to loss function

* Logger enhancements

* show_models

* Move to spawn

* Adding missing logger line

* Feedback from comments

* ADD_109

* No print allow

* [PR response]: deleted unneeded changes from merge and fixed the doc-string.

* fixed the for loop in type_check based on samuel's review

* deleted blank space pointed out by flake8

* Try no autouse

* handle nans in categorical columns (automl#118)

* handle nans in categorical columns

* Fixed error in self dtypes

* Addressed comments from francisco

* Forgot to commit

* Fix flake

* Embedding layer (automl#91)

* work in progress

* in progress

* Working network embedding

* ADD tests for network embedding

* Removed ordinal encoder

* Removed ordinal encoder

* Add seed for test_losses for reproducibility

* Addressed comments

* fix flake

* fix test import training

* ADD_109

* No print allow

* Fix tests and move to boston

* Debug issue with python 3.6

* Debug for python3.6

* Run only debug file

* work in progress

* in progress

* Working network embedding

* ADD tests for network embedding

* Removed ordinal encoder

* Removed ordinal encoder

* Addressed comments

* fix flake

* fix test import training

* Fix tests and move to boston

* Debug issue with python 3.6

* Run only debug file

* Debug for python3.6

* print paths of parent dir

* Trying to run examples

* Trying to run examples

* Add success model

* Added parent directory for printing paths

* Try no autouse

* print log file to see if backend is saving num run

* Setup logger in backend

* handle nans in categorical columns (automl#118)

* handle nans in categorical columns

* Fixed error in self dtypes

* Addressed comments from francisco

* Forgot to commit

* Fix flake

* try without embeddings

* work in progress

* in progress

* Working network embedding

* ADD tests for network embedding

* Removed ordinal encoder

* Removed ordinal encoder

* Addressed comments

* fix flake

* fix test import training

* Fix tests and move to boston

* Debug issue with python 3.6

* Run only debug file

* Debug for python3.6

* work in progress

* in progress

* Working network embedding

* ADD tests for network embedding

* print paths of parent dir

* Trying to run examples

* Trying to run examples

* Add success model

* Added parent directory for printing paths

* print log file to see if backend is saving num run

* Setup logger in backend

* try without embeddings

* no embedding for python 3.6

* Deleted debug example

* Fix test for evaluation

* Deleted utils file

Co-authored-by: chico <francisco.rivera.valverde@gmail.com>

* Fixes to address automlbenchmark problems

* Fix trajectory file output

* modified the doc-string in TransformSubset in base_dataset.py

* change config_id to config_id+1 (automl#129)

* move to a minimization problem (automl#113)

* move to a minimization problem

* Fix missing test loss file

* Missed regression

* More robust test

* Try signal timeout

* Kernel PCA failures

* Feedback from Ravin

* Better debug msg

* Feedback from comments

* Doc string request

* Feedback from comments

* Enhanced doc string

* FIX_123 (automl#133)

* FIX_123

* Better debug msg

* at least 1 config in regression

* Return self in _fit()

* Adds more examples to customise AutoPyTorch. (automl#124)

* 3 examples plus doc update

* Forgot the examples

* Added example for resampling strategy

* Update example worflow

* Fixed bugs in example and resampling strategies

* Addressed comments

* Addressed comments

* Addressed comments from shuhei, better documentation

* [Feat] Better traditional pipeline cutoff time (automl#141)

* [Feat] Better traditional pipeline cutoff time

* Fix unit testing

* Better failure msg

* bug fix catboost

* Feedback from Ravin

* First batch of feedback from comments

* Missed examples

* Syntax fix

* Hyperparameter Search Space updates now with constant and include ability (automl#146)

* In progress, add_hyperparameter

* Added SearchSpace working functionality

* Working search space update with test for __choice__ and fix flake

* fixed mypy bug and bug in making constant float hyperparameters

* Add test for fitting pipeline with constant updates

* fix flake

* bug in int for feature preprocessors and minor bugs in hyperparameter search space fixed

* Forgot to add a file

* Addressed comments, better documentation and better tests for search space updates

* Fix flake

* [Bug] Fix random halt problems on traditional pipelines (automl#147)

* [feat] Fix random halt problems on traditional pipelines

* Documentation update

* Fix flake

* Flake due to kernel pca errors

* Run history traditional (automl#121)

* In progress, issue with failed traditional

* working traditional classifiers

* Addressed comments from francisco

* Changed test loop in test_api

* Add .autopytorch runs back again

* Addressed comments, better documentation and dict for runhistory

* Fix flake

* Fix tests and add additional run info for crossval

* fix tests for train evaluator and api

* Addressed comments

* Addressed comments

* Addressed comments from shuhei, removed deleting from additioninfo

* [FIX] Enables backend to track the num run  (automl#162)

* AA_151

* doc the peek attr

* [ADD] Relax constant pipeline performance

* [Doc] First push of the developer documentation (automl#127)

* First push of the developer documentation

* Feedback from Ravin

* Document scikit-learn develop guide

* Feedback from Ravin

* Delete extra point

* Refactoring base dataset splitting functions (automl#106)

* [Fork from automl#105] Made CrossValFuncs and HoldOutFuncs class to group the functions

* Modified time_series_dataset.py to be compatible with resampling_strategy.py

* [fix]: back to the renamed version of CROSS_VAL_FN from temporal SplitFunc typing.

* fixed flake8 issues in three files

* fixed the flake8 issues

* [refactor] Address the francisco's comments

* [refactor] Adress the francisco's comments

* [refactor] Address the doc-string issue in TransformSubset class

* [fix] Address flake8 issues

* [fix] Fix flake8 issue

* [fix] Fix mypy issues raised by github check

* [fix] Fix a mypy issue

* [fix] Fix a contradiction in holdout_stratified_validation

Since stratified splitting requires to shuffle by default
and it raises error in the github check,
I fixed this issue.

* [fix] Address the francisco's review

* [fix] Fix a mypy issue tabular_dataset.py

* [fix] Address the francisco's comment about the self.dataset_name

Since we would to use the dataset name which does not have any name,
I decided to get self.dataset_name back to Optional[str].

* [fix] Fix mypy issues

* [Fix] Refactor development reproducibility (automl#172)

* [Fix] pass random state to randomized algorithms

* [Fix] double instantiation of random state

* [fix] Flaky for sample configuration

* [FIX] Runtime warning

* [FIX] hardcoded budget

* [FIX] flake

* [Fix] try forked

* [Fix] try forked

* [FIX] budget

* [Fix] missing random_state in trainer

* [Fix] overwrite in random_state

* [FIX] fix seed in splits

* [Rebase]

* [FIX] Update cv score after split num change

* [FIX] CV split

* [ADD] Extra visualization example (automl#189)

* [ADD] Extra visualization example

* Update docs/manual.rst

Co-authored-by: Ravin Kohli <13005107+ravinkohli@users.noreply.github.com>

* Update docs/manual.rst

Co-authored-by: Ravin Kohli <13005107+ravinkohli@users.noreply.github.com>

* [Fix] missing version

* Update examples/tabular/40_advanced/example_visualization.py

Co-authored-by: Ravin Kohli <13005107+ravinkohli@users.noreply.github.com>

* [FIX] make docs more clear to the user

Co-authored-by: Ravin Kohli <13005107+ravinkohli@users.noreply.github.com>

* [Fix] docs links (automl#201)

* [Fix] docs links

* Update README.md

Co-authored-by: Ravin Kohli <13005107+ravinkohli@users.noreply.github.com>

* Update examples check

* Remove tmp in examples

Co-authored-by: Ravin Kohli <13005107+ravinkohli@users.noreply.github.com>

* [Refactor] Use the backend implementation from automl common (automl#185)

* [ADD] First push to enable common backend

* Fix unit test

* Try public https

* [FIX] conftest prefix

* [fix] unit test

* [FIX] Fix fixture in score

* [Fix] pytest collection

* [FIX] flake

* [FIX] regression also!

* Update README.md

Co-authored-by: Ravin Kohli <13005107+ravinkohli@users.noreply.github.com>

* Update .gitmodules

Co-authored-by: Ravin Kohli <13005107+ravinkohli@users.noreply.github.com>

* [FIX] Regression time

* Make flaky in case memout doesn't happen

* Refacto development automl common backend debug (#2)

* [ADD] debug information

* [FIX] try fork for more stability

Co-authored-by: Ravin Kohli <13005107+ravinkohli@users.noreply.github.com>

* [DOC] Adds documentation to the abstract evaluator (automl#160)

* DOC_153

* Changes from Ravin

* [FIX] improve clarity of msg in commit

* [FIX] Update Readme (automl#208)

* Reduce run time of the test  (automl#205)

* In progress, changing te4sts

* Reduce time for tests

* Fix flake in tests

* Patch train in other tests also

* Address comments from shuhei and fransisco:

* Move base training to pytest

* Fix flake in tests

* forgot to pass n_samples

* stupid error

* Address comments from shuhei, remove hardcoding and fix bug in dummy eval function

* Skip ensemble test for python >=3.7 and introduce random state for feature processors

* fix flake

* Remove example workflow

* Remove  from __init__ in feature preprocessing

* [refactor] Getting dataset properties from the dataset object (automl#164)

* Use get_required_dataset_info of the dataset when needing required info for getting dataset requirements

* Fix flake

* Fix bug in getting dataset requirements

* Added doc string to explain dataset properties

* Update doc string in utils pipeline

* Change ubuntu version in docs workflow (automl#237)

* Add dist check worflow (automl#238)

* [feature] Greedy Portfolio (automl#200)

* initial configurations added

* In progress, adding flag in search function

* Adds documentation, example and fixes setup.py

* Address comments from shuhei, change run_greedy to portfolio_selection

* address comments from fransisco, movie portfolio to configs

* Address comments from fransisco, add tests for greedy portfolio and tests

* fix flake tests

* Simplify portfolio selection

* Update autoPyTorch/optimizer/smbo.py

Co-authored-by: Francisco Rivera Valverde <44504424+franchuterivera@users.noreply.github.com>

* Address comments from fransisco, path exception handling and test

* fix flake

* Address comments from shuhei

* fix bug in setup.py

* fix tests in base trainer evaluate, increase n samples and add seed

* fix tests in base trainer evaluate, increase n samples (fix)

Co-authored-by: Francisco Rivera Valverde <44504424+franchuterivera@users.noreply.github.com>

* [ADD] Forkserver as default multiprocessing strategy (automl#223)

* First push of forkserver

* [Fix] Missing file

* [FIX] mypy

* [Fix] renam choice to init

* [Fix] Unit test

* [Fix] bugs in examples

* [Fix] ensemble builder

* Update autoPyTorch/pipeline/components/preprocessing/image_preprocessing/normalise/__init__.py

Co-authored-by: Ravin Kohli <13005107+ravinkohli@users.noreply.github.com>

* Update autoPyTorch/pipeline/components/preprocessing/image_preprocessing/normalise/__init__.py

Co-authored-by: Ravin Kohli <13005107+ravinkohli@users.noreply.github.com>

* Update autoPyTorch/pipeline/components/preprocessing/tabular_preprocessing/encoding/__init__.py

Co-authored-by: Ravin Kohli <13005107+ravinkohli@users.noreply.github.com>

* Update autoPyTorch/pipeline/components/preprocessing/image_preprocessing/normalise/__init__.py

Co-authored-by: Ravin Kohli <13005107+ravinkohli@users.noreply.github.com>

* Update autoPyTorch/pipeline/components/preprocessing/tabular_preprocessing/feature_preprocessing/__init__.py

Co-authored-by: Ravin Kohli <13005107+ravinkohli@users.noreply.github.com>

* Update autoPyTorch/pipeline/components/preprocessing/tabular_preprocessing/scaling/__init__.py

Co-authored-by: Ravin Kohli <13005107+ravinkohli@users.noreply.github.com>

* Update autoPyTorch/pipeline/components/setup/network_head/__init__.py

Co-authored-by: Ravin Kohli <13005107+ravinkohli@users.noreply.github.com>

* Update autoPyTorch/pipeline/components/setup/network_initializer/__init__.py

Co-authored-by: Ravin Kohli <13005107+ravinkohli@users.noreply.github.com>

* Update autoPyTorch/pipeline/components/setup/network_embedding/__init__.py

Co-authored-by: Ravin Kohli <13005107+ravinkohli@users.noreply.github.com>

* [FIX] improve doc-strings

* Fix rebase

Co-authored-by: Ravin Kohli <13005107+ravinkohli@users.noreply.github.com>

* [ADD] Get incumbent config (automl#175)

* In progress get_incumbent_results

* [Add] get_incumbent_results to base task, changed additional info in abstract evaluator, and  tests

* In progress addressing fransisco's comment

* Proper check for include_traditional

* Fix flake

* Mock search of estimator

* Fixed path of run history test_api

* Addressed comments from Fransisco, making better tests

* fix flake

* After rebase fix issues

* fix flake

* Added debug information for API

* filtering only successful runs in get_incumbent_results

* Address comments from fransisco

* Revert changes made to run history assertion in base taks #1257

* fix flake issue

* [ADD] Coverage calculation (automl#224)

* [ADD] Coverage calculation

* [Fix] Flake8

* [fix] rebase artifacts

* [Fix] smac reqs

* [Fix] Make traditional test robust

* [Fix] unit test

* [Fix] test_evaluate

* [Fix] Try more time for cross validation

* Fix mypy post rebase

* Fix unit test

* [ADD] Pytest schedule (automl#234)

* add schedule for pytests workflow

* Add ref to development branch

* Add scheduled test

* update schedule workflow to run on python 3.8

* omit test, examples, workflow from coverage and remove unnecessary code from schedule

* Fix call for python3.8

* Fix call for python3.8 (2)

* fix code cov call in python 3.8

* Finally fix cov call

* [fix] Dropout bug fix (automl#247)

* fix dropout bug

* fix dropout shape discrepancy

* Fix unit test bug

* Add tests for dropout shape asper comments from fransisco

* Fix flake

* Early stop on metric

* Enable long run regression

Co-authored-by: Ravin Kohli <kohliravin7@gmail.com>
Co-authored-by: Ravin Kohli <13005107+ravinkohli@users.noreply.github.com>
Co-authored-by: bastiscode <sebastian.walter98@gmail.com>
Co-authored-by: nabenabe0928 <shuhei.watanabe.utokyo@gmail.com>
Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants