Generating the Most Learnable Dependency Scheme

This page contains a conversion script from the constituent format used in the Penn Treebank into dependency trees. It was developed in the The Hebrew University of Jerusalem by Roy Schwartz.
A complete description of the scheme appears in the paper Schwartz et al., 2012. This code is a redistribution of the pennconverter script, generously provided by Richard Johansson, and based on the paper Johansson and Nugues, 2007.

For convenience of usage, we removed most of the configuration options, so the script can only generate the most learnable scheme described in Schwartz et al., 2012. For a more flexible version of the script, use the original pennconverter script.

Download

The jar file is available here

Usage

The users of this code are requested to cite the above-mentioned publications. By downloading this code, users agree to the this license.

Requirements and Installation

java => 5 is required. No compilation/installation is required.

Running

To run, type:

java -jar learnable_pennconverter.jar

Command Line Options

`-f FILE`	read (constituency) input from FILE (default: STDIN)
`-t FILE`	output to (dependency) FILE (default: STDOUT)
`-log FILE`	write log messages to FILE (default: no messages)
`-verbosity N`	set verbosity level in log file to N (0, 1, or 2; default: 0)
`-stopOnError[=true\|false]`	terminate if an error is encountered
`-format[=conllx\|conll2008\|tab]`	Output format
`-help`	Help message

Important

The program is unable to handle treebanks in languages other than English.

References

Roy Schwartz, Omri Abend and Ari Rappoport, Learnability-based Syntactic Annotation Design. In proceedings of COLING 2012 (long paper)

Richard Johansson and Pierre Nugues, Extended Constituent-to-dependency Conversion for English, In proceedings of NODALIDA 2007 [pdf] [bib]