Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

parsing evaluation metrics #2

Open
mheilman opened this issue Jun 3, 2014 · 2 comments
Open

parsing evaluation metrics #2

mheilman opened this issue Jun 3, 2014 · 2 comments

Comments

@mheilman
Copy link
Contributor

mheilman commented Jun 3, 2014

We need some methods/scripts to evaluate parsing performance. We probably want to do two things: a) replicate previous work that uses parseval so that we can easily report previous results (see table 3 in http://www.cc.gatech.edu/~jeisenst/papers/ji-acl-2014.pdf), and b) implement a more appropriate metric based on precision/recall of relations between spans, not just precision/recall of (labeled or unlabled) spans as in parseval. See discussion from @sagae below.

  • The metrics should report unlabeled and labeled performance
  • The metrics should use the 18 coarse relations from Carlson et al.'s (2001) "Building a Discourse-tagged Corpus in the Framework of Rhetorical Structure Theory."

Discussion from @sagae

Looking at Fig 1 in http://www.isi.edu/~marcu/papers/sigdialbook2002.pdf, there are nine rhetorical relations, represented by the labeled directed arcs (same-unit is just a side effect of the annotation, and not a discourse relation). We really should be looking at precision and recall of the relations represented in these labeled arcs. So we would be looking for:

16 <- 17-26 : example
17-21 <- 22-26 : elaboration-additional
17-18 <- 19-21 : explanation-argumentative
22-25 <- 26 : consequence-s
17 <- 18 : attribution
19-20 <- 21 : attribution
19 <- 20 : elaboration-object-attribute-embedded
22 <- 23 : attribution-embedded
24 <- 25 : purpose

and precision and recall would be computed in the usual way, and successful identification of a relation requires the correct spans, the correct direction of the arrow, and the correct label. The list doesn't include 22-23 <- 24-25 : same-unit, but the parser does need to get this right to form the 22-25 span, so it's taken into account
implicitly, which I think is the right way.

@mheilman
Copy link
Contributor Author

mheilman commented Jul 2, 2014

Commit 12c5b59 implements the basic functionality for doing parseval, but it's not complete. Some edge cases still need to be dealt with (e.g., same-unit relations). See the TODO comments in the code.

@mheilman
Copy link
Contributor Author

The paper about the HILDA system (http://dad.uni-bielefeld.de/index.php/dad/article/viewFile/591/1187) says to see Marcu, 2000, 143–144 for a discussion of how PARSEVAL was adapted. (I'm still waiting to get the book from interlibrary loan.)

Marcu, 2000 = The Theory and Practice of Discourse Parsing and Summarization

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant