Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Mislabeled Instances Found #166

Closed
mueller91 opened this issue Nov 19, 2020 · 5 comments
Closed

Mislabeled Instances Found #166

mueller91 opened this issue Nov 19, 2020 · 5 comments

Comments

@mueller91
Copy link

Hi Everybody,

in a recent publication of mine, we surveyed popular data sets and looked into finding mislabeled instances.
For Fashion MNST, we found a large number of mislabeled / incorrect instances (i.e. where the automated cutting failed).
See table 7 on the last page. It contains 64 instances, but we found a lot more.
While it may or may not be dramatic for training, this may be disadvantageous for evaluation (since it skews accuracy scores).

If you're interested in fixing / looking into this, let me know.

Best
Nicolas

I added some examples (taken directly from the paper) . The heading indicates the label (which is incorrect as far as i can see) and the instance number in the training set.

45592
16691
28264
33982
40513
42018

@kashif
Copy link
Collaborator

kashif commented Nov 20, 2020

thanks @mueller91 for the insightful paper. Its true that these datasets have mis-labels, as the process that generated the original labels had humans in the loop and errors creep in that way. Other times the classes in questions are very similar and requires some domain expert to disambiguate between them.

Also looking briefly at your paper's table 7 the classes in questions do appear to be those that visually look similar. When you write that "we found a large number of mislabeled / incorrect instances" do you mean from the images that your algorithm flagged, from those a large number of them were mislabeled?

@mueller91
Copy link
Author

mueller91 commented Nov 20, 2020

The algorithm sorts all instances in the training data by desceding likelihood that they are mislabeled. So, it does not assign a binary label, but returns a scalar likelihood and gives you an ordered list of instances to review which will make best use of your time/manpower budget for reviewing (since obviously you don't want to review the whole data set).

Table 7 includes 64 instances we found that way. What we did was the following:
We ran the algorithm, obtained indexes, and looked at the first 180 of them. Within these, 64 were clearly mislabeled, about 35 percent. (This does not include ambiguous instances, i.e. we only list the item in the table when we were fairly certain of a mislabel).
Usually, the rate of 'hits' declines as you go through the instances, so in the first 1000 instances, we'd expect to find less than 35% mislabeled instances in total. The idea is to keep reviewing until your budget /time / manpower is exhausted.

@hanxiao
Copy link
Collaborator

hanxiao commented Nov 21, 2020

Thanks for your research & report. Unfortunately, we will always keep this dataset as it was originally published in 2017.

since obviously you don't want to review the whole data set

As the creator of Fashion-MNIST, I can share with you the reviewing procedure. I did review the whole dataset before publishing it. What I did: after having an initial label-mapping thanks to the Zalando article database, I wrote a small program to layout 100 images from the same class and picked out those anomalies one by one, manually. This process was repeated until I and my colleagues were satisfied with all training and test images. Tedious procedure, sure, but is it impossible? No.

Mislabelling is by nature existed in all datasets, as you showed in your paper (all datasets). If humans did their best reasonable effort when producing the dataset, then algorithms have to live with it.

Over the years, there have been quite some published papers using Fashion-MNIST dataset as a benchmark, not only for classification but in adversarial learning and many other domains. Fashion-MNIST will stay as it is.

@hanxiao hanxiao closed this as completed Nov 21, 2020
@kashif
Copy link
Collaborator

kashif commented Nov 21, 2020

@mueller91 also kindly have a look at the dataset explorer here: https://observablehq.com/@stwind/exploring-fashion-mnist of all the samples in the dataset where you can explore the images flagged by your algorithm

@mueller91
Copy link
Author

@hanxiao thank you for your statement

@kashif that's a great tool, thank you!

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants