Add random_state option to benchmark function #160

utf · 2019-01-15T02:53:36Z

Would be nice to add support for the random_sample variable of pandas.DataFrame.sample() for pipeline benchmarking.

E.g. implemented for this line:

testdf, traindf = np.split(df.sample(frac=1), [int(test_spec * len(df))])

Would allow for random but deterministic sampling of the dataframe when choosing the test/train split. This way you can benchmark two models on the same dataframe and automatically have the same test/train split. Can just override the benchmark test_spec variable again (e.g. if test_spec is an int or numpy.random.RandomState object then use it as the random_sample variable).

The text was updated successfully, but these errors were encountered:

ardunn · 2019-01-15T15:13:27Z

sounds good to me!

ardunn · 2019-01-26T00:20:49Z

@utf current benchmarking implementation has you pass in a sklearn kfold (or StratifiedKFold) to the benchmarking function. the kfold object can accept a random state param, so this issue is essentially closed.

utf · 2019-01-26T00:24:13Z

Sounds great

utf added the ugrads label Jan 15, 2019

ardunn mentioned this issue Jan 25, 2019

MatPipe code needs revamp #166

Closed

ardunn added the priority label Jan 25, 2019

ardunn closed this as completed Jan 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add random_state option to benchmark function #160

Add random_state option to benchmark function #160

utf commented Jan 15, 2019

ardunn commented Jan 15, 2019

ardunn commented Jan 26, 2019

utf commented Jan 26, 2019

Add random_state option to benchmark function #160

Add random_state option to benchmark function #160

Comments

utf commented Jan 15, 2019

ardunn commented Jan 15, 2019

ardunn commented Jan 26, 2019

utf commented Jan 26, 2019