-
Notifications
You must be signed in to change notification settings - Fork 0
junkmechanic/active-learning-vector-retrieval
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This file documents the workings of the vector retrieval part of the active learning system. The source files that can be edited have the extension .pyx. These files get compiled to .so (shared object) files to be used as binaries. The main point of entry to the system is through the python script 'al_get_vectors.py'. Following are to be specified: 'vector_file' : This should contain all the vectors, one per line. 'confidence_file' : This should contain the classification confidence of the classifier on each vector corresponding to the same index in the vector_file. 'num_clusters' : This should be the maximum number of clusters that would ever be retrieved from the system. So choose this number wisely. Because if this number is changed, then the clusters would be computed again. 'num_candidates' : This is the number of vectors you want for the current round of annotations. This should be less than num_clusters. Ideally, after all the rounds of annotations, the num_candidates from each round should all add up to num_clusters. Please keep in mind that after a round of annotations, a new set of confidence scores should be provided to the system. The system will use the cluster mediods previously computed and compute a new score for each mediod based on the updated confidence scores. After this, the system will return the top 'n' mediods, where 'n' corresponds to 'num_candidates'. Please note that the system does not remember the candidates from the previous rounds. So two consecutive rounds might have an overlap in the candidates. This is because the system does not eliminate the previously computed top mediods from the dataset. So with the new confidence scores, these mediods might appear in the top ranks again. Although, if the classifier is doing a good job, there should generally not be a big overlap between the candidate list of two consecutive rounds.
About
A vector retrieval mechanism for the active learning framework based on K-medoid clustering with parallelism written in Cython
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published