Skip to content

Commit

Permalink
update ee
Browse files Browse the repository at this point in the history
  • Loading branch information
shengyumao committed Jun 28, 2023
1 parent 6fb6067 commit 318103e
Show file tree
Hide file tree
Showing 6 changed files with 60 additions and 39 deletions.
30 changes: 20 additions & 10 deletions example/ee/standard/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
```bash
python==3.8
pip install -r requirements.txt
pip install hydra-core==1.3.1 # ignore the conlict with deepke
```

## Download Code
Expand All @@ -30,30 +31,39 @@ Follow the instruction [here](./data/DuEE/README.md)

## Train

Modify the parameters in `./conf/train.yaml`.
Modify the parameters in `./conf/train.yaml`. Select different dataset by set `data_name`, and change the `model_name_or_path` for different dataset.

- Trigger
- Trigger (Event Detection or Trigger Classification)

First train trigger classification model, and predict the trigger of each instance.

Set `task_name` to `trigger`.
Select different dataset by set `data_name`.
Then run the following command:

```bash
python run.py
```

The prediction will be conducted after the training, and the result will be in `exp/xx/trigger/xxx/eval_pred.json`.

- Role
Here we train the event arguments extraction model with the gold trigger.
Then train the event arguments extraction model by the gold trigger.
- Role (Event Arguments Extraction)
Then, we train the event arguements extraction models, here we train the event arguments extraction model with the gold trigger.
Set `task_name` to `role`.
Select different dataset by set `data_name`.
Then run the following command:

```bash
python run.py
```

## Predict
## Predict (Event Arguments Extraction)

The trigger prediction has been conducted during training, and the result is in the `output_dir`. Here we predict the event arguments extraction results with pred trigger result.

Modify the parameters in `./conf/predict.yaml`. Set the `model_name_or_path` to the trained role model path, and `do_pipeline_predict=True` to do the pipeline prediction.

The trigger prediction has been conducted during training, and the result is in the `output_dir`.Here we predict the event arguments extraction results with pred trigger result.
Modify the parameters in `./conf/predict.yaml`.
Then run the following command:
```bash
python predict.py
```
```
The final result will be in `eval_pred.json` of the role model path folder.
30 changes: 20 additions & 10 deletions example/ee/standard/README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
```bash
python==3.8
pip install -r requirements.txt
pip install hydra-core==1.3.1 # ignore the conlict with deepke
```

## 克隆代码
Expand All @@ -31,30 +32,39 @@ cd DeepKE/example/ee/standard

## 训练

`./conf/train.yaml`中修改模型参数
`./conf/train.yaml`中修改训练参数。可以通过更改`data_name`参数来选择不同的数据集,并将`model_name_or_path`改成对应的模型。

整个事件抽取的训练分为两部分,第一部分是训练触发词分类的模型,第二部分是训练事件角色抽取的模型。

- Trigger 触发词

首先是完成每个instance的触发词的抽取,

`task_name`设置为`trigger`
可以通过更改`data_name`参数来选择不同的数据集。
然后运行下述命令

```bash
python run.py
```

- 事件角色
在这里我们用正确的trigger训练事件元素抽取模型

在训练后会完成触发词的预测,结果保存在`exp/xx/trigger/xxx/eval_pred.json`

- 事件元素 (Event Arguments Extraction)
然后是训练一个事件角色抽取的模型,在这里我们用正确的trigger训练事件元素抽取模型。
`task_name`设置为`role`
可以通过更改`data_name`参数来选择不同的数据集。
然后运行下述命令

```bash
python run.py
```

## 预测
## 预测 (Event Arguments Extraction)

触发词的预测在训练的过程中会完成,预测的结果在`output_dir`中。在这里我们使用预测得到的触发词来抽取事件抽取元素
`./conf/predict.yaml`中修改模型参数。
触发词的预测在训练的过程中会完成,预测的结果在`output_dir`中。在这里我们使用预测得到的触发词来完成pipeline的事件元素抽取
`./conf/predict.yaml`中修改模型参数。`model_name_or_path`设置为训练好的事件元素抽取模型的路径,并且设置`do_pipeline_predict=True`来完成pipeline的事件抽取。
然后运行下述命令

```bash
python predict.py
```
```
最后的预测结果会在role模型对应目录的`eval_pred.json`下。
8 changes: 4 additions & 4 deletions example/ee/standard/conf/predict.yaml
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
defaults:
- train

data_name: ACE # [ACE, DuEE]
model_name_or_path: ./exp/ACE/role/bert-base-uncased
data_name: DuEE # [ACE, DuEE]
model_name_or_path: ./exp/DuEE/role/bert-base-chinese
task_name: role # the trigger prediction is done during the training process.
do_train: False
do_eval: True
Expand All @@ -12,5 +12,5 @@ do_pipeline_predict: True
overwrite_cache: True


dev_trigger_pred_file: ./exp/ACE/trigger/bert-base-uncased/eval_pred.json
test_trigger_pred_file: ./exp/ACE/trigger/bert-base-uncased/test_pred.json
dev_trigger_pred_file: ./exp/DuEE/trigger/bert-base-chinese/eval_pred.json # change to your pred file of trigger classification
test_trigger_pred_file: ./exp/DuEE/trigger/bert-base-chinese/test_pred.json
12 changes: 6 additions & 6 deletions example/ee/standard/conf/train.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
data_name: ACE # [ACE, DuEE]
model_name_or_path: ../../../../models/bert-base-uncased # [bert-base-uncased, bert-base-chinese] english for ace, chinese for duee
task_name: role # [trigger, role]
data_name: DuEE # [ACE, DuEE]
model_name_or_path: /newdisk1/msy/models/bert-base-chinese # [bert-base-uncased, bert-base-chinese] english for ace, chinese for duee
task_name: trigger # [trigger, role]
model_type: bertcrf
do_train: True
do_eval: True
Expand All @@ -19,16 +19,16 @@ per_gpu_eval_batch_size: 16
gradient_accumulation_steps: 1
max_seq_length: 256
max_grad_norm: 1.0
num_train_epochs: 10
max_steps: 5000
num_train_epochs: 5
max_steps: 500
warmup_steps: 0
logging_steps: 500
save_steps: 500
eval_all_checkpoints: False
no_cuda: False
n_gpu: 0
overwrite_output_dir: True
overwrite_cache: False
overwrite_cache: True
seed: 42
fp16: False
fp16_opt_level: "01"
Expand Down
12 changes: 7 additions & 5 deletions example/ee/standard/predict.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,8 +46,8 @@ def main(args):
args.data_dir = os.path.join(args.cwd, "./data/" + args.data_name + "/" + args.task_name)
args.tag_path = os.path.join(args.cwd, "./data/" + args.data_name + "/schema")
args.model_name_or_path = os.path.join(args.cwd, args.model_name_or_path)
args.dev_trigger_pred_file = os.path.join(args.cwd, args.dev_trigger_pred_file) if args.do_pipeline_predict else None
args.test_trigger_pred_file = os.path.join(args.cwd, args.test_trigger_pred_file) if args.do_pipeline_predict else None
args.dev_trigger_pred_file = os.path.join(args.cwd, args.dev_trigger_pred_file) if args.do_pipeline_predict and args.task_name=="role" else None
args.test_trigger_pred_file = os.path.join(args.cwd, args.test_trigger_pred_file) if args.do_pipeline_predict and args.task_name=="role" else None
args.do_predict = True if args.data_name == "ACE" else False

# Setup CUDA, GPU & distributed training
Expand Down Expand Up @@ -75,8 +75,10 @@ def main(args):
args.model_type = args.model_type.lower()
config_class, model_class, tokenizer_class = MODEL_CLASSES[args.model_type]

config = config_class.from_pretrained(args.model_name_or_path)
tokenizer = tokenizer_class.from_pretrained(args.model_name_or_path, do_lower_case=args.do_lower_case)
model = model_class.from_pretrained(args.model_name_or_path)
model = model_class.from_pretrained(args.model_name_or_path, config=config)
logger.info(f"label_nums:{config.num_labels}")
model.to(device)

pad_token_label_id = -100
Expand All @@ -91,15 +93,15 @@ def main(args):

raw_path = "/".join(args.data_dir.split("/")[:-1])
if args.do_eval:
if args.dev_trigger_pred_file is not None:
if args.task_name=="role" and args.dev_trigger_pred_file is not None:
processor.process_dev_with_pred_trigger(args, raw_path, "dev_with_pred_trigger.tsv")
eval_examples = processor.get_examples(os.path.join(args.data_dir, "dev_with_pred_trigger.tsv"), "dev")
else:
eval_examples = processor.get_examples(os.path.join(args.data_dir, "dev.tsv"), "dev")
eval_dataset = load_and_cache_examples(args, eval_examples , tokenizer, labels, pad_token_label_id, mode="dev")

if args.do_predict:
if args.test_trigger_pred_file is not None:
if args.task_name=="role" and args.test_trigger_pred_file is not None:
processor.process_test_with_pred_trigger(args, raw_path, "test_with_pred_trigger.tsv")
test_examples = processor.get_examples(os.path.join(args.data_dir, "test_with_pred_trigger.tsv"), "test")
else:
Expand Down
7 changes: 3 additions & 4 deletions example/ee/standard/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,11 +1,10 @@
torch==1.10.0
transformers==4.26.0
hydra-core==1.3.1
tensorboardx==2.4
tensorboardx==2.5.1
lxml==4.9.1
beautifulsoup4==4.9.3
bs4==0.0.1
stanza==1.2
sentencepiece==0.1.95
ipdb==0.13.9
deepke
ipdb==0.13.11
deepke==2.2.3

0 comments on commit 318103e

Please sign in to comment.