Data Driven Testing #117

janmotl · 2018-09-04T15:35:45Z

I was thinking about applying data driven testing (ddt).

Why ddt:

test_encoders contains a lot of copy-pasted code
test_estimators already loops over all encoders
ddt, in comparison to a simple loop, does not stop on the first error (unless you tell it to do so) and reports results for each subtest (not just for the whole loop)
ddt makes it easy to quickly increase test coverage
ddt makes it easy to test that all encoders in the library fulfill some basic requirements

But of course, there are disadvantages:

It makes it slightly harder to understand and write tests
I have encountered behaviors (two, to be exact) that could be considered valid but break uniform behaviour of the library. I do not know what to do - adjust the tests or the encoders?

Example of the code.

Questions:

Should I continue in the effort to port portable tests under ddt while preserving the encoder specific tests?
If yes, is it ok to make a branch instead of a pull request? I hope it is going to be easier (for me) to maintain synchronicity between the development and master branch.

The text was updated successfully, but these errors were encountered:

wdm0006 · 2018-09-04T18:54:14Z

I think this makes sense (the testing part), and think in this context a feature branch sounds fine. I'd caution though that we should build these tests alongside the existing test suite for now, and merge in regularly, rather than have a super-long-lived development branch.

janmotl · 2018-09-06T17:04:36Z

Pushed the solution into tests branch.

Changes:

runtime decreased from 90 seconds to 38 seconds on my laptop (mostly because of decreasing the dataset size from 1000 rows to 100 rows)
line test coverage of encoders increased from ~80% to ~94% (may vary based on the used definition)

Two tests fail. They are described in issues #120 and #121.

janmotl · 2018-09-11T18:20:11Z

By mistake, I have pushed into 'master' instead of 'tests' branch. I deeply apologize. I suggest to revert to commits 2394855 or b1993e0.

Current state: All tests now pass. At least on my computer.

Errors reported by TravisCI:

Failure: ImportError (No module named unittest2) Resolvable by adding dependency into TravisCI. Somewhere.
ValueError: minlength must be positive No idea (and I am using pandas 0.23.4 just like TravisCI).
TypeError: __init__() got an unexpected keyword argument 'handle_unknown' No idea.

CircleCI fails as well.

I am not able to resolve these issues.

wdm0006 · 2018-09-14T18:20:13Z

CircleCI always fails, its just pushing docs. the failure is just that there are no tests, so thats ok.

TravisCI installs the packages in the install_requires in setup.py, but thats not appropriate for unittest2, just add that directly into the install.sh script in ci_scripts directory.

on (2), not sure.

On (3), not sure.

Do you have tracebacks for either of them?

janmotl · 2018-09-15T13:26:08Z

The issues with TravisCI were caused by the used versions of dependencies (my computer was using different versions than TravisCI). I reconfigured TravisCI to use the oldest versions specified in requirements.txt. But it turned out that BinaryEncoder does not work with pandas < 0.20.0rc1.
But pandas 0.20.1, which is in conda, requires numpy > 1.11. And the oldest scipy that supports numpy 1.11 is 0.17. Similarly, the oldest scikit-learn that supports numpy 1.11 and is in conda is 0.17.1. I have also slightly bumped patsy and statsmodels to versions available in conda in order to make the minimal requirements testable.

Overview of the changes:

Former minimal requirements:

numpy>=1.8.0
scikit-learn>=0.15.0
scipy>=0.9
statsmodels>=0.6.0<=0.8.0
pandas>=0.15.0
patsy>=0.4.0

Former versions used by TravisCI:

numpy 1.10.4
scipy 0.17.0
scikit-learn 0.20rc1
statsmodels 0.8.0
pandas 0.23.4
patsy 0.5.0

Current minimal requirements:

numpy>=1.11.1
scikit-learn>=0.17.1
scipy>=0.17.0
statsmodels>=0.6.1<=0.8.0
pandas>=0.20.1
patsy>=0.4.1

Current versions used by TravisCI:

numpy 1.11.1            # the oldest compatible with pandas 0.20.1
scikit-learn 0.17.1     # oldest compatible with numpy 1.11
scipy 0.17.0            # oldest compatible with numpy 1.11
statsmodels 0.6.1       # oldest compatible with numpy 1.11
pandas 0.20.1           # minimal working with BinaryEncoder
patsy 0.4.1             # oldest compatible with numpy 1.11

If older versions of libraries have to be supported, BinaryEncoder must be modified.

Currently, TravisCI reports only one error:

======================================================================
ERROR: category_encoders.tests.test_encoders.set_testing_parameters
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/nose/util.py", line 620, in newfunc
    return func(*arg, **kw)
TypeError: set_testing_parameters() takes exactly 1 argument (0 given)

janmotl · 2018-09-16T14:43:55Z

The last issue in TravisCI was resolved by removing a * import. Now I get it why everyone says not to use * imports...

@wdm0006, will you take care of CircleCI?

janmotl · 2018-10-08T20:36:47Z

CircleCI is failing during sudo apt-get update with:

Err http://ppa.launchpad.net precise/main Sources
  404  Not Found

Err http://ppa.launchpad.net precise/main amd64 Packages
  404  Not Found

Err http://ppa.launchpad.net precise/main i386 Packages
  404  Not Found

Ign http://ppa.launchpad.net precise/main Translation-en_US

Ign http://ppa.launchpad.net precise/main Translation-en

Fetched 7,938 kB in 9s (852 kB/s)
W: GPG error: https://apt.dockerproject.org ubuntu-precise Release: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY F76221572C52609D
W: Failed to fetch http://ppa.launchpad.net/pdoes/ppa/ubuntu/dists/precise/main/source/Sources  404  Not Found

W: Failed to fetch http://ppa.launchpad.net/pdoes/ppa/ubuntu/dists/precise/main/binary-amd64/Packages  404  Not Found

W: Failed to fetch http://ppa.launchpad.net/pdoes/ppa/ubuntu/dists/precise/main/binary-i386/Packages  404  Not Found

E: Some index files failed to download. They have been ignored, or old ones used instead.

sudo apt-get update returned exit code 100

Action failed: sudo apt-get update

Knowing nothing better, I would suggest to check /etc/apt/sources.list and /etc/apt/sources.list.d/*.list and remove sources that no longer exists. Unfortunately, I think I do not have the necessary privileges to do the change.

wdm0006 · 2018-10-14T14:10:11Z

I'll take a look at this, thanks for the investigation work @janmotl

janmotl added the enhancement label Sep 4, 2018

JohnnyC08 mentioned this issue Jan 4, 2019

Ordinal encoder support new handle unknown handle missing #153

Merged

janmotl mentioned this issue May 7, 2019

Coverage status in README.md is broken #189

Closed

janmotl closed this as completed May 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Driven Testing #117

Data Driven Testing #117

janmotl commented Sep 4, 2018

wdm0006 commented Sep 4, 2018

janmotl commented Sep 6, 2018

janmotl commented Sep 11, 2018

wdm0006 commented Sep 14, 2018

janmotl commented Sep 15, 2018

janmotl commented Sep 16, 2018

janmotl commented Oct 8, 2018

wdm0006 commented Oct 14, 2018

Data Driven Testing #117

Data Driven Testing #117

Comments

janmotl commented Sep 4, 2018

wdm0006 commented Sep 4, 2018

janmotl commented Sep 6, 2018

janmotl commented Sep 11, 2018

wdm0006 commented Sep 14, 2018

janmotl commented Sep 15, 2018

janmotl commented Sep 16, 2018

janmotl commented Oct 8, 2018

wdm0006 commented Oct 14, 2018