Step1: Generate free-text reports from the graph based on manually defined rules using script.
Step2: Evaluate the reports quality regarding NLP metrics with script.
This script needs to work with the RATCHET library, you may need put this script in its nlp_metrics folder, so the dir looks like
RATCHET/nlp_metrics/eval.py
Extra: We provide a demo script to show that NLP metric sucks in the clinical application here ;)
Step1: Given a report, we need to parse it firstly into clinical entities as first step. For this, we use CheXpert to extract the entities from the reports.
Step2: We selected entities of interest out, and evaluate the classification performance by comparing with the ground truth with script.
For the classification results generated by Prior-RadGraphFormer, the performance is evaluated with inference.py and radgraph_eval.py.