Averaging of CE Metrics #11

ChantalMP · 2023-08-09T15:46:07Z

Hi,

thanks for sharing your work.

As I understand from your paper, Table 6 in your paper reports example-based metrics (so F1 score for every report, then averaged), not micro or macro F1. Is that correct?

From where did you find out that the other papers to which you compare also use example-based F1 and not micro or macro F1?

Any hint would be appreciated.

Thanks in advance! :)

fuying-wang · 2024-02-29T15:09:09Z

Hi,

Thanks very much for the awesome work. I have the same question. It seems that the results of baselines in Table 6 are the same as the results in the original papers. While according to the code of R2Gen, it seems that they are using macro or micro-based CE metrics.

anicolson · 2024-02-29T21:36:21Z

Hi ChantalMP and fuying-wang,

Thank you for pointing this out. Our reported results in Table 6 are indeed averaged over each example (example-based CE metrics). The results for the other methods are reported from their respective papers. We found it difficult to determine how the CE scores were averaged in the methods respective papers, as this detail was not included. Based on the fact that papers prior to R2Gen reported the used method of averaging (macro- or micro-averaging), we assumed that papers such as R2Gen not mention this meant that they were not using either. Instead we assumed they were averaging over all examples (this may have been a bad assumption).

Do alleviate this discrepancy, we made sure to report how we averaged our results. Hopefully, this can be avoided in future papers on the topic. Unfortunately, we may have made the mistake of comparing to a different averaging strategy.

If they indeed used micro- or macro-averaging, and not averaging over each example, then the micro- and macro-averaged results for CvT2DistilGPT2 can be found here: https://github.com/aehrc/cvt2distilgpt2?tab=readme-ov-file#results.

fuying-wang · 2024-03-01T05:07:56Z

Hi,

Thanks very much for detailed clarification! I also noticed that R2Gen and other papers didn't mention their averaging method, which also makes me confused. Apart from this, your code and detailed results are awesome!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Averaging of CE Metrics #11

Averaging of CE Metrics #11

ChantalMP commented Aug 9, 2023

fuying-wang commented Feb 29, 2024 •

edited

Loading

anicolson commented Feb 29, 2024

fuying-wang commented Mar 1, 2024 •

edited

Loading

Averaging of CE Metrics #11

Averaging of CE Metrics #11

Comments

ChantalMP commented Aug 9, 2023

fuying-wang commented Feb 29, 2024 • edited Loading

anicolson commented Feb 29, 2024

fuying-wang commented Mar 1, 2024 • edited Loading

fuying-wang commented Feb 29, 2024 •

edited

Loading

fuying-wang commented Mar 1, 2024 •

edited

Loading