Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Getting a ton of WARNING messages: "Currently no support in Processor for returning problematic ids" #759

Closed
johann-petrak opened this issue May 6, 2021 · 6 comments
Assignees
Labels
question Further information is requested

Comments

@johann-petrak
Copy link
Contributor

Using the latest FARM installed with pip install farm, I am getting many many WARNING messages on the log:

" WARNING - farm.data_handler.processor - Currently no support in Processor for returning problematic ids"

What does this mean and is there anything I can do about it?

@johann-petrak johann-petrak added the question Further information is requested label May 6, 2021
@Timoeller
Copy link
Contributor

Hey @johann-petrak cool to see you using FARM again 😄

So this warning is not that there are problematic input samples per se but that we do not have functionality for it in place for some processors.

For the QA processor we can return problematic samples during preprocessing, e.g. for TextclassificationProcessor and its derivatives we cannot. See https://github.com/deepset-ai/FARM/blob/master/farm/data_handler/processor.py#L675
If you want to improve FARM in this respect I see two options:

  1. [quick win] Change the message to Info or reduce the number it is displayed.
  2. [correct but difficult] Implement the problemtaic id check for the processor you need.

@johann-petrak
Copy link
Contributor Author

Sorry, TBH so far what I do not understand is much more basic: what is actually meant by "problematic input sample" i.e. which error conditions make a sample problematic? And what error conditions could actually already occur when converting samples in the TextclassificationProcessor?
Apparently the only way how these ids can bubble up is through an exception in self._sample_to_features(sample=sample) ?

@Timoeller
Copy link
Contributor

An exception in _sample_to_features could be a good start. In general problematic input sample means an input sample that cannot be converted to a pytorch tensor in the correct way, so it is rather general.
We would like input processing to be stable, so catching exceptions on input specifics is a way forward. OF course we want to return the IDs of those problematic samples later.
For an example please have a look how Question Answering is converted, e.g. here.

I think currently the message is not really informative and also pops up per process. So option 1 would already improve FARM.

@Timoeller Timoeller self-assigned this May 8, 2021
@Timoeller
Copy link
Contributor

Hey @johann-petrak would you be interested in contributing the quick improvement I proposed as method 1?

Method 1: Change the message to Info and/or reduce the number of times it is displayed.

@johann-petrak
Copy link
Contributor Author

johann-petrak commented May 19, 2021

I think the only reasonable way to do this is to move the warning into the constructor.

Since the processor instance is pickled and replicated in many many other processes, there is no (practical and easy) way for those processes to figure out if any of them is the first to emit the warning.

Since the warning is really about the TextClassificationProcessor implementation, emitting it from the constructor makes sense as well, I think.

@Timoeller
Copy link
Contributor

Makes sense to put it there! Would you like to create a PR?

Timoeller pushed a commit that referenced this issue May 20, 2021
* Fix for #759

* Remove "problematic id" warning from subclass
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants