Skip to content

Latest commit

 

History

History
124 lines (98 loc) · 3.99 KB

README.md

File metadata and controls

124 lines (98 loc) · 3.99 KB

Which person discovered the class of Swinhoe's Crake?

Source code for ":hatched_chick:Crake:owl:: Causal-Enhanced Table-Filler for Question Answering over Large Scale Knowledge Base" to appear at NAACL 2022 Findings

Due to the nature of KBQA pipelines (that require specific KB servers, KB linking APIs, etc.), despite our efforts to release all our codes and provide the instructions, this process is rather complicated to implement from scratch and may contain erroneous edge cases (we apologize in advance for the potential issues:), please feel free to reach out to us for any questions!:crossed_fingers:

alt text

To train a model for GSG

  • Run the training script below
cd src_main/QG_TF_NE_multitask
python main.py
  • See ../data/checkpoint/QG_TF_NE_cmtl_fcn_no_pretrain/qg_tf_ne_dev.pth for the saved checkpoint
  • A well-trained GSG model can be found here

To train a model for RE

  • Run the preprocessing & training script below
cd src_main/RE
# Preprocessing
python preprocess_data.py
# Training
python main.py
  • See ../data/checkpoint/RE_Roberta_small_samp/roberta_dev_latest.pth for the saved checkpoint
  • A well-trained RE model can be found here

To run end-to-end evaluations

Build a KB server

dbpedia_2016-04.nt
infobox_properties_en.ttl
infobox_properties_mapped_en.ttl
instance_types_dbtax_ext_en.ttl
instance_types_en.ttl
instance_types_lhd_ext_en.ttl
instance_types_sdtyped_dbo_en.ttl
instance_types_transitive_en.ttl
labels_en.ttl
mappingbased_literals_en.ttl
mappingbased_objects_en.ttl
mappingbased_objects_uncleaned_en.ttl
persondata_en.ttl
skos_categories_en.ttl
specific_mappingbased_properties_en.ttl
topical_concepts_en.ttl
uri_same_as_iri_en.ttl

Build a DBpedia Lookup API

Run the GSG stage

cd src_main/pipeline
  • Modify the code in tf_ne_qg_link.py as below
  • To customize the clients of Lookup-API and KB server according to the ip/port the server is running on
{
    ...
    'kb_host_ip': '111.111.222.222',
    'kb_host_port': 1234,
    'lookup_url': 'http://111.111.222.222:5678/lookup-application/api/search',
    ...
}
  • Run the GSG stage using the model trained before
python tf_ne_qg_link.py
  • The intermediate results of GSG generated can be found in ../result/NEQG_inference_results/test_neqg_link_1219_cmtl.json. A sample file is also provided in the data supplementaries.

Run the RE stage

  • Firstly, copy the RE well-trained RE model to the the RE model server directory
cp src_main/data/checkpoint/RE_Roberta_small_samp/roberta_dev_latest.pth re_model_server/data/checkpoint_en/RE_Roberta/
  • Start the RE server at background on localhost
cd re_model_server
python server.py
  • Modify the code in pipeline.py as below
  • To customize the clients of RE model server and KB server according to the ip/port the server is running on
{
    ...
    'kb_host_ip': '111.222.111.222',
    'kb_host_port': 9275,
    're_host_ip': 'localhost',
    're_host_port': 9305,
    ...
}
  • Run the RE stage using the model trained before, which also gives the end-to-end evaluation results
python pipeline.py
  • The results can be found in ../result/RE_inference_results/test_re_1219_cmtl.json. A sample file is also provided in the data supplementaries.