Skip to content

Data Driven Testing #117

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
janmotl opened this issue Sep 4, 2018 · 8 comments
Closed

Data Driven Testing #117

janmotl opened this issue Sep 4, 2018 · 8 comments

Comments

@janmotl
Copy link
Collaborator

janmotl commented Sep 4, 2018

I was thinking about applying data driven testing (ddt).

Why ddt:

  1. test_encoders contains a lot of copy-pasted code
  2. test_estimators already loops over all encoders
  3. ddt, in comparison to a simple loop, does not stop on the first error (unless you tell it to do so) and reports results for each subtest (not just for the whole loop)
  4. ddt makes it easy to quickly increase test coverage
  5. ddt makes it easy to test that all encoders in the library fulfill some basic requirements

But of course, there are disadvantages:

  1. It makes it slightly harder to understand and write tests
  2. I have encountered behaviors (two, to be exact) that could be considered valid but break uniform behaviour of the library. I do not know what to do - adjust the tests or the encoders?

Example of the code.

Questions:

  1. Should I continue in the effort to port portable tests under ddt while preserving the encoder specific tests?
  2. If yes, is it ok to make a branch instead of a pull request? I hope it is going to be easier (for me) to maintain synchronicity between the development and master branch.
@wdm0006
Copy link
Collaborator

wdm0006 commented Sep 4, 2018

I think this makes sense (the testing part), and think in this context a feature branch sounds fine. I'd caution though that we should build these tests alongside the existing test suite for now, and merge in regularly, rather than have a super-long-lived development branch.

@janmotl
Copy link
Collaborator Author

janmotl commented Sep 6, 2018

Pushed the solution into tests branch.

Changes:

  1. runtime decreased from 90 seconds to 38 seconds on my laptop (mostly because of decreasing the dataset size from 1000 rows to 100 rows)
  2. line test coverage of encoders increased from ~80% to ~94% (may vary based on the used definition)

Two tests fail. They are described in issues #120 and #121.

@janmotl
Copy link
Collaborator Author

janmotl commented Sep 11, 2018

By mistake, I have pushed into 'master' instead of 'tests' branch. I deeply apologize. I suggest to revert to commits 2394855 or b1993e0.

Current state: All tests now pass. At least on my computer.

Errors reported by TravisCI:

  1. Failure: ImportError (No module named unittest2) Resolvable by adding dependency into TravisCI. Somewhere.
  2. ValueError: minlength must be positive No idea (and I am using pandas 0.23.4 just like TravisCI).
  3. TypeError: __init__() got an unexpected keyword argument 'handle_unknown' No idea.

CircleCI fails as well.

I am not able to resolve these issues.

@wdm0006
Copy link
Collaborator

wdm0006 commented Sep 14, 2018

CircleCI always fails, its just pushing docs. the failure is just that there are no tests, so thats ok.

TravisCI installs the packages in the install_requires in setup.py, but thats not appropriate for unittest2, just add that directly into the install.sh script in ci_scripts directory.

on (2), not sure.

On (3), not sure.

Do you have tracebacks for either of them?

@janmotl
Copy link
Collaborator Author

janmotl commented Sep 15, 2018

The issues with TravisCI were caused by the used versions of dependencies (my computer was using different versions than TravisCI). I reconfigured TravisCI to use the oldest versions specified in requirements.txt. But it turned out that BinaryEncoder does not work with pandas < 0.20.0rc1.
But pandas 0.20.1, which is in conda, requires numpy > 1.11. And the oldest scipy that supports numpy 1.11 is 0.17. Similarly, the oldest scikit-learn that supports numpy 1.11 and is in conda is 0.17.1. I have also slightly bumped patsy and statsmodels to versions available in conda in order to make the minimal requirements testable.

Overview of the changes:

Former minimal requirements:

numpy>=1.8.0
scikit-learn>=0.15.0
scipy>=0.9
statsmodels>=0.6.0<=0.8.0
pandas>=0.15.0
patsy>=0.4.0

Former versions used by TravisCI:

numpy 1.10.4
scipy 0.17.0
scikit-learn 0.20rc1
statsmodels 0.8.0
pandas 0.23.4
patsy 0.5.0

Current minimal requirements:

numpy>=1.11.1
scikit-learn>=0.17.1
scipy>=0.17.0
statsmodels>=0.6.1<=0.8.0
pandas>=0.20.1
patsy>=0.4.1

Current versions used by TravisCI:

numpy 1.11.1            # the oldest compatible with pandas 0.20.1
scikit-learn 0.17.1     # oldest compatible with numpy 1.11
scipy 0.17.0            # oldest compatible with numpy 1.11
statsmodels 0.6.1       # oldest compatible with numpy 1.11
pandas 0.20.1           # minimal working with BinaryEncoder
patsy 0.4.1             # oldest compatible with numpy 1.11

If older versions of libraries have to be supported, BinaryEncoder must be modified.

Currently, TravisCI reports only one error:

======================================================================
ERROR: category_encoders.tests.test_encoders.set_testing_parameters
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/nose/util.py", line 620, in newfunc
    return func(*arg, **kw)
TypeError: set_testing_parameters() takes exactly 1 argument (0 given)

@janmotl
Copy link
Collaborator Author

janmotl commented Sep 16, 2018

The last issue in TravisCI was resolved by removing a * import. Now I get it why everyone says not to use * imports...

@wdm0006, will you take care of CircleCI?

@janmotl
Copy link
Collaborator Author

janmotl commented Oct 8, 2018

CircleCI is failing during sudo apt-get update with:

Err http://ppa.launchpad.net precise/main Sources
  404  Not Found

Err http://ppa.launchpad.net precise/main amd64 Packages
  404  Not Found

Err http://ppa.launchpad.net precise/main i386 Packages
  404  Not Found

Ign http://ppa.launchpad.net precise/main Translation-en_US

Ign http://ppa.launchpad.net precise/main Translation-en

Fetched 7,938 kB in 9s (852 kB/s)
W: GPG error: https://apt.dockerproject.org ubuntu-precise Release: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY F76221572C52609D
W: Failed to fetch http://ppa.launchpad.net/pdoes/ppa/ubuntu/dists/precise/main/source/Sources  404  Not Found

W: Failed to fetch http://ppa.launchpad.net/pdoes/ppa/ubuntu/dists/precise/main/binary-amd64/Packages  404  Not Found

W: Failed to fetch http://ppa.launchpad.net/pdoes/ppa/ubuntu/dists/precise/main/binary-i386/Packages  404  Not Found

E: Some index files failed to download. They have been ignored, or old ones used instead.

sudo apt-get update returned exit code 100

Action failed: sudo apt-get update

Knowing nothing better, I would suggest to check /etc/apt/sources.list and /etc/apt/sources.list.d/*.list and remove sources that no longer exists. Unfortunately, I think I do not have the necessary privileges to do the change.

@wdm0006
Copy link
Collaborator

wdm0006 commented Oct 14, 2018

I'll take a look at this, thanks for the investigation work @janmotl

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

2 participants