Getting a ton of WARNING messages: "Currently no support in Processor for returning problematic ids" #759

johann-petrak · 2021-05-06T17:28:29Z

Using the latest FARM installed with pip install farm, I am getting many many WARNING messages on the log:

" WARNING - farm.data_handler.processor - Currently no support in Processor for returning problematic ids"

What does this mean and is there anything I can do about it?

The text was updated successfully, but these errors were encountered:

Timoeller · 2021-05-07T15:35:11Z

Hey @johann-petrak cool to see you using FARM again 😄

So this warning is not that there are problematic input samples per se but that we do not have functionality for it in place for some processors.

For the QA processor we can return problematic samples during preprocessing, e.g. for TextclassificationProcessor and its derivatives we cannot. See https://github.com/deepset-ai/FARM/blob/master/farm/data_handler/processor.py#L675
If you want to improve FARM in this respect I see two options:

[quick win] Change the message to Info or reduce the number it is displayed.
[correct but difficult] Implement the problemtaic id check for the processor you need.

johann-petrak · 2021-05-07T17:08:43Z

Sorry, TBH so far what I do not understand is much more basic: what is actually meant by "problematic input sample" i.e. which error conditions make a sample problematic? And what error conditions could actually already occur when converting samples in the TextclassificationProcessor?
Apparently the only way how these ids can bubble up is through an exception in self._sample_to_features(sample=sample) ?

Timoeller · 2021-05-08T14:36:08Z

An exception in _sample_to_features could be a good start. In general problematic input sample means an input sample that cannot be converted to a pytorch tensor in the correct way, so it is rather general.
We would like input processing to be stable, so catching exceptions on input specifics is a way forward. OF course we want to return the IDs of those problematic samples later.
For an example please have a look how Question Answering is converted, e.g. here.

I think currently the message is not really informative and also pops up per process. So option 1 would already improve FARM.

Timoeller · 2021-05-19T16:13:07Z

Hey @johann-petrak would you be interested in contributing the quick improvement I proposed as method 1?

Method 1: Change the message to Info and/or reduce the number of times it is displayed.

johann-petrak · 2021-05-19T16:39:07Z

I think the only reasonable way to do this is to move the warning into the constructor.

Since the processor instance is pickled and replicated in many many other processes, there is no (practical and easy) way for those processes to figure out if any of them is the first to emit the warning.

Since the warning is really about the TextClassificationProcessor implementation, emitting it from the constructor makes sense as well, I think.

Timoeller · 2021-05-19T19:51:15Z

Makes sense to put it there! Would you like to create a PR?

* Fix for #759 * Remove "problematic id" warning from subclass

johann-petrak added the question Further information is requested label May 6, 2021

Timoeller self-assigned this May 8, 2021

Timoeller pushed a commit that referenced this issue May 20, 2021

Fix for #759 (#772)

0357634

* Fix for #759 * Remove "problematic id" warning from subclass

johann-petrak closed this as completed May 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting a ton of WARNING messages: "Currently no support in Processor for returning problematic ids" #759

Getting a ton of WARNING messages: "Currently no support in Processor for returning problematic ids" #759

johann-petrak commented May 6, 2021

Timoeller commented May 7, 2021

johann-petrak commented May 7, 2021

Timoeller commented May 8, 2021

Timoeller commented May 19, 2021

johann-petrak commented May 19, 2021 •

edited

Loading

Timoeller commented May 19, 2021

Getting a ton of WARNING messages: "Currently no support in Processor for returning problematic ids" #759

Getting a ton of WARNING messages: "Currently no support in Processor for returning problematic ids" #759

Comments

johann-petrak commented May 6, 2021

Timoeller commented May 7, 2021

johann-petrak commented May 7, 2021

Timoeller commented May 8, 2021

Timoeller commented May 19, 2021

johann-petrak commented May 19, 2021 • edited Loading

Timoeller commented May 19, 2021

johann-petrak commented May 19, 2021 •

edited

Loading