WRONG metrics implementation when calculating the image level F1 score! #30

SunnyHaze · 2023-04-18T16:23:43Z

There is a severe error when calculating the image level F1 score. This question can be exactly found here:

Lines 44 to 45 in cc2aed7

    
           spe = true_neg / (true_neg + false_pos + 1e-6) 
        
           f1 = 2 * sen * spe / (sen + spe)

The correct metric would be F1 = 2 * true_pos / (2 * true_pos + false_pos + false_neg + eps), where eps is a numerical stability factor.

Or calculate with the Recall in the coming formula instead of the Specificity:

And your implementation for pixel level F1 score is correct, you could refer to this part of the codes:

MVSS-Net/common/utils.py

Lines 57 to 59 in cc2aed7

    
           f1 = 2 * true_pos / (2 * true_pos + false_pos + false_neg + 1e-6) 
        
           precision = true_pos / (true_pos + false_pos + 1e-6) 
        
           recall = true_pos / (true_pos + false_neg + 1e-6)

I hope you can recalculate and revise the metrics of the image level F1 score results, and at least publish a patch Table of metrics on GitHub, otherwise, it will be very unfair for future research, as this calculation will bring higher results to your F1 score values.

To create a better research environment, we hope that you can value research integrity! Thank you very much for your inspiring work!!

The text was updated successfully, but these errors were encountered:

kostino · 2023-04-18T20:39:47Z

Adding on to the points made by @SunnyHaze:
Concretely the difference between the computed metric lies here:

As we can see depending on the distributions in the datasets that we experiment on the reported result can be way off the true value of F1. Especially in real-world scenarios where naturally TN >> TP this will produce a value significantly higher than the true F1 value

Chenxr1999 · 2023-04-19T01:45:08Z

There is a severe error when calculating the image level F1 score. This question can be exactly found here:

MVSS-Net/common/utils.py

Lines 44 to 45 in cc2aed7

spe = true_neg / (true_neg + false_pos + 1e-6)

f1 = 2 * sen * spe / (sen + spe)

The correct metric would be F1 = 2 * true_pos / (2 * true_pos + false_pos + false_neg + eps), where eps is a numerical stability factor.

Or calculate with the Recall in the coming formula instead of the Specificity:

And your implementation for pixel level F1 score is correct, you could refer to this part of the codes:

MVSS-Net/common/utils.py

Lines 57 to 59 in cc2aed7

f1 = 2 * true_pos / (2 * true_pos + false_pos + false_neg + 1e-6)

precision = true_pos / (true_pos + false_pos + 1e-6)

recall = true_pos / (true_pos + false_neg + 1e-6)

I hope you can recalculate and revise the metrics of the image level F1 score results, and at least publish a patch Table of metrics on GitHub, otherwise, it will be very unfair for future research, as this calculation will bring higher results to your F1 score values.

To create a better research environment, we hope that you can value research integrity! Thank you very much for your inspiring work!!

As stated in Section 4.1 of the paper,

For the pixel-level manipulation detection, following previous works, we compute pixel-level precision and recall, and report their F1. For image-level manipulation detection, in order to measure the miss detection rate and false alarm rate, we report sensitivity, specifificity and their F1.

For pixel-level evaluation, in order to compare with previous works, we calculate F1 using the harmonic average of precision and recall. For image-level evaluation, F1 is calculated as the harmonic average of sensitivity and specificity, since sensitivity and specificity are not affected by the distribution of positive and negative samples in the test data, so a more reliable evaluation conclusion can be obtained.

SunnyHaze · 2023-04-19T02:51:38Z

Thank you for your reply!

we report sensitivity, specificity and their F1.

F1 score is a specialized term used to refer to the harmonic mean of the precision and recall(WIKIPIDIA). The “F1” in the above sentence is almost impossible for most readers to consider it as the harmonic mean of sensitivity and specificity, if they have received a good mathematical education. The better choice should be described as "the harmonic mean of sensitivity and specificity".

You said this is a more reliable conclusion but did not clarify the source of this particular value and attempt to confuse it. However, if this false high indicator cannot be clearly pointed out, it will cause considerable trouble for subsequent Image Manipulation Detection reviewers, making it difficult to truly evaluate the current SOTA model. Because they only glance at the result of Image-level classification table and say, 'This is the result of that ordinary F1 score.'.

I hope you can seriously consider this issue and maintain academic honesty, and I will further investigate this matter.

li-xirong · 2023-04-19T03:12:55Z

this is not an issue and has nothing to do with academic honesty. Case closed.

SunnyHaze · 2023-04-20T06:41:11Z

6

erliufashi · 2023-11-23T06:56:28Z

In fact, the harmonic average between sensitivity and specificity should not be used at all, but rather the average value directly, which has a special name Balanced accuracy (BA)

Chenxr1999 closed this as completed Apr 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WRONG metrics implementation when calculating the image level F1 score! #30

WRONG metrics implementation when calculating the image level F1 score! #30

SunnyHaze commented Apr 18, 2023 •

edited

Loading

kostino commented Apr 18, 2023

Chenxr1999 commented Apr 19, 2023

SunnyHaze commented Apr 19, 2023

li-xirong commented Apr 19, 2023

SunnyHaze commented Apr 20, 2023

erliufashi commented Nov 23, 2023

WRONG metrics implementation when calculating the image level F1 score! #30

WRONG metrics implementation when calculating the image level F1 score! #30

Comments

SunnyHaze commented Apr 18, 2023 • edited Loading

kostino commented Apr 18, 2023

Chenxr1999 commented Apr 19, 2023

SunnyHaze commented Apr 19, 2023

li-xirong commented Apr 19, 2023

SunnyHaze commented Apr 20, 2023

erliufashi commented Nov 23, 2023

SunnyHaze commented Apr 18, 2023 •

edited

Loading