Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add random_state option to benchmark function #160

Closed
utf opened this issue Jan 15, 2019 · 3 comments
Closed

Add random_state option to benchmark function #160

utf opened this issue Jan 15, 2019 · 3 comments

Comments

@utf
Copy link
Member

utf commented Jan 15, 2019

Would be nice to add support for the random_sample variable of pandas.DataFrame.sample() for pipeline benchmarking.

E.g. implemented for this line:

testdf, traindf = np.split(df.sample(frac=1), [int(test_spec * len(df))])

Would allow for random but deterministic sampling of the dataframe when choosing the test/train split. This way you can benchmark two models on the same dataframe and automatically have the same test/train split. Can just override the benchmark test_spec variable again (e.g. if test_spec is an int or numpy.random.RandomState object then use it as the random_sample variable).

@utf utf added the ugrads label Jan 15, 2019
@ardunn
Copy link
Contributor

ardunn commented Jan 15, 2019

sounds good to me!

@ardunn
Copy link
Contributor

ardunn commented Jan 26, 2019

@utf current benchmarking implementation has you pass in a sklearn kfold (or StratifiedKFold) to the benchmarking function. the kfold object can accept a random state param, so this issue is essentially closed.

@ardunn ardunn closed this as completed Jan 26, 2019
@utf
Copy link
Member Author

utf commented Jan 26, 2019

Sounds great

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants