SciIE (Under Construction)

This repository contains code and models for replicating results from the following publication:

Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction(EMNLP, 2018)
Yi Luan, Luheng He, Mari Ostendorf, Hannaneh Hajishirzi
In EMNLP 2018

Part of the codebase is extended from lsgn and e2e-coref.

Requirements

Python 2.7
- TensorFlow 1.8.0
- pyhocon (for parsing the configurations)
- tensorflow_hub (for loading ELMo)

Getting Started

Python 2.7
TensorFlow 1.8.0
pyhocon (for parsing the configurations)
tensorflow_hub (for ELMo)
GloVe embeddings and downloading data:
./scripts/fetch_required_data.sh
Build kernels: ./scripts/build_custom_kernels.sh (Please make adjustments to the script based on your OS/gcc version)

Setting up for ELMo (if you use your own data)

Some of our models are trained with the ELMo embeddings. We use the ELMo model loaded by tensorflow_hub.
It is recommended to cache ELMo embeddings for training and validating efficiency. Please modify the corresponding filenames and run python generate_elmo.py to generate ELMo embeddings.

Training Instructions

Experiment configurations are found in experiments.conf
The parameter main_metrics can be selected from coref, ner, relation or any combination of the three, such as coref_ner_relation which indicates the F1 score for averaged F1 score for coref, ner and relation. The model is tuned and saved based on the resulting averaged F1 score.
The parameters ner_weight, coref_weight and relation_weight are weights for the multi-task objective. If set the weight to 0 then the task is not trained.
Choose an experiment that you would like to run, e.g. scientific_best_ner
For a single-machine experiment, run the following two commands in parallel:
- python singleton.py <experiment>
- python evaluator.py <experiment>
Results are stored in the logs directory and can be viewed via TensorBoard.
For final evaluation of the checkpoint with the maximum dev F1:
- python test_single.py <experiment>

Other Quirks

It does not use GPUs by default. Instead, it looks for the GPU environment variable, which the code treats as shorthand for CUDA_VISIBLE_DEVICES.
The evaluator should not be run on GPUs, since evaluating full documents does not fit within GPU memory constraints.
The training runs indefinitely and needs to be terminated manually.

Making Predictions with Pretrained Models

Define the output path in experiments.conf as output_path, the system will output the results of eval_path to output_path. The output file is also a json file, which has thesame format as eval_path. Then run python write_single.py <experiment>

Best Models

Best models can be downloaded from here (Best NER Model,Best Coref Model,Best Relation Model), unzip and put the model under ./logs . For making predictings or testing results use the same command as the previous steps.

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
scripts		scripts
BuildKG.py		BuildKG.py
CalculateGenericRate.py		CalculateGenericRate.py
JsonSerializer.py		JsonSerializer.py
README.md		README.md
char_vocab_old.english.txt		char_vocab_old.english.txt
conll.py		conll.py
coref_metrics.py		coref_metrics.py
create_script.py		create_script.py
debug_utils.py		debug_utils.py
decoder.py		decoder.py
embedding_helper.py		embedding_helper.py
evaluator.py		evaluator.py
experiments.conf		experiments.conf
generate_elmo.py		generate_elmo.py
inference_utils.py		inference_utils.py
inference_utils_confidence.py		inference_utils_confidence.py
inference_utils_nooverlap.py		inference_utils_nooverlap.py
inference_utils_test.py		inference_utils_test.py
input_utils.py		input_utils.py
lsgn_data.py		lsgn_data.py
lsgn_evaluator.py		lsgn_evaluator.py
lsgn_evaluator_writer.py		lsgn_evaluator_writer.py
model_utils.py		model_utils.py
relation_metrics.py		relation_metrics.py
singleton.py		singleton.py
srl_eval_utils.py		srl_eval_utils.py
srl_kernels.cc		srl_kernels.cc
srl_model.py		srl_model.py
srl_ops.py		srl_ops.py
test_single.py		test_single.py
util.py		util.py
write_single.py		write_single.py
write_single_adv.py		write_single_adv.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SciIE (Under Construction)

Requirements

Getting Started

Setting up for ELMo (if you use your own data)

Training Instructions

Other Quirks

Making Predictions with Pretrained Models

Best Models

About

Releases 2

Packages

Contributors 3

Languages

YerevaNN/SciERC

Folders and files

Latest commit

History

Repository files navigation

SciIE (Under Construction)

Requirements

Getting Started

Setting up for ELMo (if you use your own data)

Training Instructions

Other Quirks

Making Predictions with Pretrained Models

Best Models

About

Resources

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 3

Languages

Packages