Evaluation and comparison of the RASP system and the Stanford unlexicalized PCFG parser

This project has been carried on as part of the final assignment of the L95 module. It aims at studying two parsing models: the RASP system (Briscoe et al., 2006) and the unlexicalized PCFG parser from the Stanford NLP tools.

It is composed of multiple elements:

a PDF report that describes all the work done, presents results and offers analyses about models and results
a directory data that gathers two files with the sentences studied in this work
a directory parsers_output that gathers parsers outputs for the previous sentences
a directory gold_standards that gathers gold standards for the previous sentences in the Grammatical Relation format and the CoNLL format
an XML file eval.xml, useful for the MaltEval evaluation system

RASP system

To run the RASP system, you first need to download the RASP software and place the corresponding directory in the main directory of the project. Then, you can parse sentences with the RASP system thanks to the following command line:

RASP/scripts/rasp.sh -m < data/handout_sentences.txt > parsers_outputs/rasp_handout.txt

It is possible to change the output format with different parser options. Here are some examples:

-p'-og': grammatical relations (default output format)
-p'-ow': weighted grammatical relations
-p'-nk': outputs at most k parses for each sentence

Stanford unlexicalized PCFG parser

Once again, to run the unlexicalized PCFG parser from the Stanford NLP tools, you need to download the Stanford tool and place the corresponding directory in the main directory of the project. Then, you can parse sentences with the following command line:

Stanford_parser/lexparser.sh data/handout_sentences.txt

Parsed sentences are printed directly in the console.

It is possible to change the output format by changing the outputFormat option in the lexparser.sh file. In order to obtain CoNLL dependencies, I set the output format option to conll2007. Other output formats are available, such as penn or typedDependencies.

MaltEval tool

The MaltEval tool is an efficient tool to compare files in the CoNLL format, and therefore evaluate the performance of a parser by comparison with gold standards. To use this tool, you need to download the MaltEval software and place the corresponding directory in the main directory of the project. Then, you can compare two CoNLL files with the following command line:

java -jar MaltEval/lib/MaltEval.jar -e eval.xml -s parsers_outputs/stanford_output.conll -g gold_standards/gold_standards.conll

Statistics about the parser's performance are directly displayed in the console.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
gold_standards		gold_standards
parsers_outputs		parsers_outputs
README.md		README.md
eval.xml		eval.xml
report.pdf		report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Evaluation and comparison of the RASP system and the Stanford unlexicalized PCFG parser

RASP system

Stanford unlexicalized PCFG parser

MaltEval tool

About

Releases

Packages

alodieboissonnet/ParsingAlgorithms

Folders and files

Latest commit

History

Repository files navigation

Evaluation and comparison of the RASP system and the Stanford unlexicalized PCFG parser

RASP system

Stanford unlexicalized PCFG parser

MaltEval tool

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages