Below are several examples to illustrate how to use ASReview-datatools
. Make
sure to have installed
asreview-datatools and
ASReview LAB v1.1 or higher.
Overview of the tutorials:
Allowed data formats are described in the ASReview
documentation.
ASReview converts the labeling decisions in RIS files to a binary variable:
irrelevant as 0
and relevant as 1
. Records marked as unseen or with
missing labeling decisions are converted to -1
.
Assume you are working on a systematic review and you want to update the
review with newly available records. The original data is stored in
MY_LABELED_DATASET.csv
and the file contains a
column
containing the labeling decissions. In order to update the systematic review,
you run the original search query again but with a new date. You save the
newly found records in SEARCH_UPDATE.ris
.
In the command line interface (CLI), navigate to the directory where the dataset(s) are stored:
cd Parent_directory
The original data and the newly found records are in a different datafile
format (CSV and RIS). You can convert files to the same file format using the
convert
script. For example, to convert SEARCH_UPDATE.ris to CSV format,
open the command line interface (CLI) and navigate to the directory where the
dataset(s) are stored and run
asreview data convert SEARCH_UPDATE.ris SEARCH_UPDATE.csv
Duplicate records can be removed with with dedup
script. The algorithm
removes duplicates using the Digital Object Indentifier
(DOI) and the title plus abstract.
asreview data dedup SEARCH_UPDATE.csv -o SEARCH_UPDATE_DEDUP.csv
If you want to see descriptive info on your input datasets, run these commands:
asreview data describe MY_LABELED_DATASET.csv -o MY_LABELED_DATASET_description.json
asreview data describe SEARCH_UPDATE_DEDUP.csv -o SEARCH_UPDATE_description.json
The results will be exported to MY_LABELED_DATASET_description.json
and SEARCH_UPDATE_description.json
.
Use the compose
script to add SEARCH_UPDATE_DEDUP.csv
to MY_LABELED_DATASET.csv
:
asreview data compose updated_search.csv -l MY_LABELED_DATASET.csv -u SEARCH_UPDATE_DEDUP.csv
The flag -l
means the labels in MY_LABELED_DATASET.csv
will be kept.
The flag -u
means all records from SEARCH_UPDATE_DEDUP.csv
will be
added as unlabeled to the composed dataset.
If a record exists in both datasets, it is assumed the record containing a label is maintained, see the default conflict resolving strategy. To keep both records (with and without label), use
asreview data compose updated_search.csv -l MY_LABELED_DATASET.csv -u SEARCH_UPDATE_DEDUP.csv -c keep
The composed dataset will be exported to COMPOSED_DATA.csv
.
To see descriptive info on the composed dataset:
asreview data describe COMPOSED_DATA.csv -o updated_search_description.json
The result will be exported to updated_search_description.json
.
The partly
labelled
data, COMPOSED_DATA.csv
, can be uploaded to ASReview lab - Oracle
mode. The
lables will be reckognized by ASReview and used to train the first iteration
of the model and you can continue screening all unlabeled records found in the
new search.
Assume you have just executed a search query for a systematic review and you
want to use a pre-defined set of relevant and irrelevant records as training
data. The search results are stored in SEARCH_RESULTS.ris
, and the records
you already know to be relevant/irrelevant are saved in
PRIOR_RELEVANT.ris
and PRIOR_IRRELEVANT.ris
respectively.
In the command line interface (CLI), navigate to the directory where the dataset(s) are stored:
cd Parent_directory
If you want to see descriptive info on your input datasets, run these commands:
asreview data describe SEARCH_RESULTS.ris -o SEARCH_RESULTS_description.json
asreview data describe PRIOR_RELEVANT.ris -o PRIOR_RELEVANT_description.json
asreview data describe PRIOR_IRRELEVANT.ris -o PRIOR_IRRELEVANT_description.json
The results will be exported to SEARCH_RESULTS_description.json
,
PRIOR_RELEVANT_description.json
and PRIOR_IRRELEVANT_description.json
.
To create one dataset with labels only for the training data to be used in ASREview, run:
asreview data compose search_with_priors.ris -u SEARCH_RESULTS.ris -r PRIOR_RELEVANT.ris -i PRIOR_IRRELEVANT.ris
The flag -r
means all records from PRIOR_RELEVANT.ris
will be added as
relevant records to the composed dataset.
The flag -i
means all records from PRIOR_IRRELEVANT.ris
will be added
as irrelevant.
The flag -u
means all other records from SEARCH_RESULTS.ris
will be
added as unlabeled.
If any duplicate records exist across the datasets, by default the order of keeping labels is:
- relevant
- irrelevant
- unlabeled
You can configure the behavior in resolving conflicting labels by setting the
hierarchy differently. To do so, pass the letters r (relevant), i
(irrelevant), and u (unlabeled) in any order to, for example, --hierarchy uir
.
The composed dataset will be exported to search_with_priors.ris
.
To see descriptive info on the composed dataset:
asreview data describe search_with_priors.ris -o search_with_priors_description.json
The result will be exported to search_with_priors_description.json
in the
output folder.
The partly
labelled
data, search_with_priors.ris
, can be uploaded to ASReview lab - Oracle
mode. The
lables will be reckognized by ASReview and used to train the first iteration
of the model and you can continue screening all unlabeled records found in the
new search.
Assume you want to use the simulation mode of ASReview but the data is not stored in one singe file containing the meta-data and labelling decissions as required by ASReview.
Suppose the following files are available:
SCREENED.ris
: all records that were screenedRELEVANT.ris
: the subset of relevant records after manually screening all the records.
You need to compose the files into a single file where all records from
RELEVANT.csv
are relevant all other records are irrelevant.
In the command line interface (CLI), navigate to the directory where the dataset(s) are stored:
cd Parent_directory
If you want to see descriptive info on your input datasets, run these commands:
asreview data describe SCREENED.ris -o SCREENED_description.json
asreview data describe RELEVANT.ris -o RELEVANT_description.json
The results will be exported to SCREENED_description.json
and RELEVANT_description.json
.
Run compose.py
to compose a new dataset from SCREENED.ris
and RELEVANT.ris
:
asreview data compose screened_with_labels.ris -i SCREENED.ris -r RELEVANT.ris
The flag -r
means all records from RELEVANT.ris
will be added as
relevant to the composed dataset.
The flag -i
means all other records from SCREENED.ris
will be added as
irrelevant.
The composed dataset will be exported to screened_with_labels.ris
.
To see descriptive info on the composed dataset:
asreview data describe screened_with_labels.ris -o screened_with_labels_description.json
The result will be exported to screened_with_labels_description.json
.
The resulting file screened_with_labels.ris
can be uploaded to ASReview lab
Simulation
mode. This
allows you to simulate the screening procedure of the systematic review as if
it were carried out using ASReview lab.