Skip to content

Commit

Permalink
Prepare dev_random pipeline
Browse files Browse the repository at this point in the history
  • Loading branch information
Eszti committed Nov 18, 2024
1 parent 0a1813b commit 44d69ab
Show file tree
Hide file tree
Showing 4 changed files with 49 additions and 3 deletions.
16 changes: 13 additions & 3 deletions tuw_nlp/sem/hrg/Documentation.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ python steps/predict/predict.py -d $DATA_DIR -c pipeline/config/predict_100.json
Once all sentences are predicted, we [merge](steps/predict/merge.py) them into one json per model.

```bash
python steps/predict/merge.py -d $DATA_DIR -c pipeline/config/merge_100.json
python steps/predict/merge.py -d $DATA_DIR -c pipeline/config/merge_100.json
```

#### Run the whole predict pipeline on dev
Expand All @@ -81,10 +81,20 @@ python pipeline/pipeline.py -d $DATA_DIR -c pipeline/config/pipeline_dev_300.jso

### Create a random predictions for comparison

We implement a [random extractor](random/random_extractor.py) that uses the [artefacts](random/train_stat) of the training dataset (distribution of the number of extractions per sentence, and distribution of labels per length of the sentence) and assures that the predicate is a verb.
We implement a [random extractor](steps/random/random_extractor.py) that uses the [artefacts](pipeline/output/artefacts) of the training dataset (distribution of the number of extractions per sentence, and distribution of labels per length of the sentence) and assures that the predicate is a verb.

```bash
# TBD
# Extract artefacts
python steps/random/artefacts.py -d $DATA_DIR -c pipeline/config/artefacts_train.json

# Get random extractions
python steps/random/random_extractor.py -d $DATA_DIR -c pipeline/config/random_dev.json

# Merge the extractions
python steps/predict/merge.py -d $DATA_DIR -c pipeline/config/merge_dev_random.json

# Or run as a pipeline
python pipeline/pipeline.py -d $DATA_DIR -c pipeline/config/pipeline_dev_random.json
```

### Evaluate the predictions
Expand Down
14 changes: 14 additions & 0 deletions tuw_nlp/sem/hrg/pipeline/config/merge_dev_random.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{
"in_dir": "dev_random",
"k": 10,
"bolinas_chart_filters":
[
"boa",
"argidx"
],
"postprocess":
[
""
],
"out_dir": "dev_extractions"
}
20 changes: 20 additions & 0 deletions tuw_nlp/sem/hrg/pipeline/config/pipeline_dev_random.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
{
"steps":
[
{
"step_name": "artefacts",
"script_name": "artefacts",
"config": "artefacts_train.json"
},
{
"step_name": "random",
"script_name": "random",
"config": "random_dev.json"
},
{
"step_name": "merge_random",
"script_name": "merge",
"config": "merge_dev_random.json"
}
]
}
2 changes: 2 additions & 0 deletions tuw_nlp/sem/hrg/pipeline/config/random_dev.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
{
"in_dir": "dev_preproc",
"out_dir": "dev_random",
"artefact_prefix": "artefacts_train",
"models":
[
"boa",
Expand Down

0 comments on commit 44d69ab

Please sign in to comment.