Reader for CoNLL-2012 using Ontonotes-5.0 documents.
Generates .conll files by reading the .skel files from the CoNLL-2012 dataset and the documents from Ontonotes-5.0.
-
In the script
setup_training.shchange theontonotes_pathto the directory containing Ontonotes-5.0. This directory should contain four items: data/ docs/ tools/ index.html -
Run by replacing
OUTPUT_DIRwith the path where the processed files will be kept.
bash setup_training.sh OUTPUT_DIR
OUTPUT_DIR will contain
(a) conll-2012 directory contains the conll files in the v4 directory.
(b) {train, dev, test}.english.jsonlines files contain the training, development, and test data for CoNLL-2012.
The code is from Kenton Lee