Skip to content

Latest commit

 

History

History
148 lines (109 loc) · 4.79 KB

README.md

File metadata and controls

148 lines (109 loc) · 4.79 KB

NCER | Clinical NER Benchmark

Installation

git clone https://github.com/WadoodAbdul/clinical_ner_benchmark.git
cd clinical_ner_benchmark
pip install -e .

Usage

from clinical_ner.models import SpanExtractor
from clinical_ner.evaluation import Evaluator
from clinical_ner.benchmarks import NCER

model_name = "alvaroalon2/biobert_diseases_ner"

benchmark = NCER 

# the below config is model and dataset specific. This should contain config for all datasets in the loaded benchmark
dataset_wise_config = {
        "NCBI": {"label_normalization_map": {"DISEASE": "condition"}}
    }
# load a predefined model (or for a custom implementation see https://github.com/WadoodAbdul/clinical_ner_benchmark/blob/master/docs/custom_model_implementation.md)
model = SpanExtractor.from_predefined(model_name)

evaluator = Evaluator(model, benchmark=benchmark, dataset_wise_config=dataset_wise_config)
evaluation.run()

Advanced Usage (click to unfold)

Advanced Usage

Using a custom model

Models should be inherited from the GenericSpanExtractor or SpanExtractor abstract classes.

from clinical_ner.models import GenericSpanExtractor
from clinical_ner.models.span_dataclasses import NERSpans

class MyCustomModel(GenericSpanExtractor):
    def extract_spans_from_chunk(text: str, **kwargs) -> NERSpans:
        """
        Extracts spans from sequences of any length

        Args:
            text: The text from which spans should be extracted.
            **kwargs: Additional arguments to pass to the encoder.

        Returns:
            The NERSpans.
        """
        pass


model = MyModel()
benchmark = NCER 

# the below config is model and dataset specific.
dataset_wise_config = {
        "dataset_name": {"label_normalization_map": {"DISEASE": "condition"}}
    }
evaluator = Evaluator(model, benchmark=benchmark, dataset_wise_config=dataset_wise_config)
evaluation.run()

More information on custom implementation can be found here


Documentation

Documentation
📋 Datasets Overview of available Datasets
📋 Metrics Overview of available Metrics
📈 Leaderboard The interactive leaderboard of the benchmark
🤖 Submit to leaderboard Information related to how to submit a model to the leaderboard
👩‍🔬 Reproducing results Information related to how to reproduce the results on the leaderboard
👩‍💻 Custom model implementation How to add a custom model to run the evaluation pipeline

Citing

@misc{abdul2024namedclinicalentityrecognition,
      title={Named Clinical Entity Recognition Benchmark}, 
      author={Wadood M Abdul and Marco AF Pimentel and Muhammad Umar Salman and Tathagata Raha and Clément Christophe and Praveen K Kanithi and Nasir Hayat and Ronnie Rajan and Shadab Khan},
      year={2024},
      eprint={2410.05046},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={
https://arxiv.org/abs/2410.05046}, 
}