Identifying Mislabeled Data using the Area Under the Margin Ranking

Implementation of the research paper Identifying Mislabeled Data using the Area Under the Margin Ranking.

Original paper: https://arxiv.org/pdf/2001.10528v4

This technique can be used to identify mislabeled or difficult samples in a dataset. These samples can then be relabeled or removed to improve the final performance of a model trained on the data.

Project structure

identify_mislabeled_data.ipynb is an example showing how to apply AUM Ranking to identify mislabeled samples in a dataset. It outputs TensorBoard logs to runs/, which can be viewed with tensorboard --logdir runs/.
aum_ranking.py contains all the code specific to AUM Ranking.
models.py defines the ResNet-32 model used in the AUM paper.
test_aum_ranking.py contains tests for aum_ranking.py.

Setup

1. Virtual environment

Ensure you have Python installed, create a virtual environment and activate it.

2. Install PyTorch packages

With the virtual environment activated, run

pip install -r requirements_pytorch.txt [--index-url INDEX_URL]

The --index-url should only be specified if advised by https://pytorch.org/get-started/locally/.

3. Install remaining packages

Now run

pip install -r requirements_main.txt

to install the remaining packages.

You should now be able to run identify_mislabeled_data.ipynb.

4. (Optional) Install dev packages

If you want to be able to run the tests, then run

pip install -r requirements_dev.txt

to install pytest.

To run the tests, run the command pytest . (including the full stop).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Identifying Mislabeled Data using the Area Under the Margin Ranking

Project structure

Setup

1. Virtual environment

2. Install PyTorch packages

3. Install remaining packages

4. (Optional) Install dev packages

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
aum_ranking.py		aum_ranking.py
identify_mislabeled_data.ipynb		identify_mislabeled_data.ipynb
models.py		models.py
requirements_dev.txt		requirements_dev.txt
requirements_main.txt		requirements_main.txt
requirements_pytorch.txt		requirements_pytorch.txt
test_aum_ranking.py		test_aum_ranking.py

License

AlexKubiesa/area-under-the-margin-ranking

Folders and files

Latest commit

History

Repository files navigation

Identifying Mislabeled Data using the Area Under the Margin Ranking

Project structure

Setup

1. Virtual environment

2. Install PyTorch packages

3. Install remaining packages

4. (Optional) Install dev packages

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages