You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We need some methods/scripts to evaluate parsing performance. We probably want to do two things: a) replicate previous work that uses parseval so that we can easily report previous results (see table 3 in http://www.cc.gatech.edu/~jeisenst/papers/ji-acl-2014.pdf), and b) implement a more appropriate metric based on precision/recall of relations between spans, not just precision/recall of (labeled or unlabled) spans as in parseval. See discussion from @sagae below.
The metrics should report unlabeled and labeled performance
The metrics should use the 18 coarse relations from Carlson et al.'s (2001) "Building a Discourse-tagged Corpus in the Framework of Rhetorical Structure Theory."
Looking at Fig 1 in http://www.isi.edu/~marcu/papers/sigdialbook2002.pdf, there are nine rhetorical relations, represented by the labeled directed arcs (same-unit is just a side effect of the annotation, and not a discourse relation). We really should be looking at precision and recall of the relations represented in these labeled arcs. So we would be looking for:
and precision and recall would be computed in the usual way, and successful identification of a relation requires the correct spans, the correct direction of the arrow, and the correct label. The list doesn't include 22-23 <- 24-25 : same-unit, but the parser does need to get this right to form the 22-25 span, so it's taken into account
implicitly, which I think is the right way.
The text was updated successfully, but these errors were encountered:
Commit 12c5b59 implements the basic functionality for doing parseval, but it's not complete. Some edge cases still need to be dealt with (e.g., same-unit relations). See the TODO comments in the code.
We need some methods/scripts to evaluate parsing performance. We probably want to do two things: a) replicate previous work that uses parseval so that we can easily report previous results (see table 3 in http://www.cc.gatech.edu/~jeisenst/papers/ji-acl-2014.pdf), and b) implement a more appropriate metric based on precision/recall of relations between spans, not just precision/recall of (labeled or unlabled) spans as in parseval. See discussion from @sagae below.
Discussion from @sagae
Looking at Fig 1 in http://www.isi.edu/~marcu/papers/sigdialbook2002.pdf, there are nine rhetorical relations, represented by the labeled directed arcs (
same-unit
is just a side effect of the annotation, and not a discourse relation). We really should be looking at precision and recall of the relations represented in these labeled arcs. So we would be looking for:and precision and recall would be computed in the usual way, and successful identification of a relation requires the correct spans, the correct direction of the arrow, and the correct label. The list doesn't include
22-23 <- 24-25 : same-unit
, but the parser does need to get this right to form the22-25 span
, so it's taken into accountimplicitly, which I think is the right way.
The text was updated successfully, but these errors were encountered: