This page contains a conversion script from the constituent format used in the Penn Treebank into dependency trees.
It was developed in the The Hebrew University of Jerusalem by Roy Schwartz.
A complete description of the scheme appears in the paper Schwartz et al., 2012.
This code is a redistribution of the pennconverter script, generously provided by Richard Johansson, and based on the paper Johansson and Nugues, 2007.
For convenience of usage, we removed most of the configuration options, so the script can only generate the most learnable scheme described in Schwartz et al., 2012.
For a more flexible version of the script, use the original pennconverter script.
java -jar learnable_pennconverter.jar
-f FILE | read (constituency) input from FILE (default: STDIN) |
-t FILE | output to (dependency) FILE (default: STDOUT) |
-log FILE | write log messages to FILE (default: no messages) |
-verbosity N |
set verbosity level in log file to N (0, 1, or 2; default: 0) |
-stopOnError[=true|false] |
terminate if an error is encountered |
-format[=conllx|conll2008|tab] | Output format |
-help | Help message |
Roy Schwartz, Omri Abend and Ari Rappoport, Learnability-based Syntactic Annotation Design. In proceedings of COLING 2012 (long paper)
Richard Johansson and Pierre Nugues, Extended Constituent-to-dependency Conversion for English, In proceedings of NODALIDA 2007 [pdf] [bib]