Skip to content

[Feature Proposal] Diverse Mini Batch Active Learning #119

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
damienlancry opened this issue Jan 20, 2021 · 1 comment
Open

[Feature Proposal] Diverse Mini Batch Active Learning #119

damienlancry opened this issue Jan 20, 2021 · 1 comment

Comments

@damienlancry
Copy link
Contributor

hello, I noticed there is a big focus on uncertainty based sampling and information density based sampling techniques which is very nice. but in batch mode active learning, when several data points are sent to the oracle at the same time, it is often desired that the data points sent be diverse to avoid redundancy and maximise improvement of the model. several techniques has been designed, one of the most recent and also one of the simplest is Diverse Mini Batch Active Learning.

TLDR: compute uncertainty with chosen metric (e.g. margin, entropy, ...) and then prefilter ninstances * beta (beta is a prefiltering parameter, typically 10, 50 or 100) topmost uncertain data points. then perform kmeans clustering on those prefiltered points with instances clusters and select closest points to centroids.

it is quite simple to implement and give good results. I already have an implementation ready if you are interested in a PR.

@cosmic-cortex
Copy link
Member

Hi!

Sure, this would be awesome! Open the PR and I'll take a look at it. (Just a disclaimer, I am quite busy with other projects at this time, so it might take me 2-3 weeks to review it. Sorry in advance :D)

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants