Skip to content

Latest commit

 

History

History

guide

Using medaCy: Tutorials and Workflows

This directory contains common workflows for using medaCy

Table of contents

  1. How medaCy Works
  2. Building a medaCy Pipeline
  3. Pre-trained Models
  4. Distributing Trained Models
  5. Interaction with spaCy

How medaCy Works

MedaCy leverages the text-processing power of spaCy with state-of-the-art research tools and techniques in medical text mining. MedaCy consists of a set of lightning-fast pipelines that are specialized for learning specific types of medical entities and relations. A pipeline consists of a stackable and interchangeable set of PipelineComponents - these are bite-sized code blocks that each overlay a feature onto the text being processed.

Pipeline Components

PipelineComponents can be developed to utilize in custom Pipelines by interfacing the BaseOverlayer and BasePipeline classes respectively. Alternatively use components already implemented in medaCy. Some more powerful components require outside software - an example is the MetaMapOverlayer which interfaces with MetaMap to overlay rich medical concept information onto text. Components are chained or stacked in pipelines and can themselves depend on the outputs of previous components to function. In the underlying implementation, a medaCy PipelineComponent is a wrapper over a spaCy component that includes a number of utilities specific to faciliting the training, utilization, and distribution process of medical domain text processing models.

Utilizing Pre-trained NER models

To run a medaCy pre-trained model over your own data, simply install the package associated with the model by following the links below. Models officially supported by medacy all start with the prefix medacy_model. For example, assuming you have medaCy installed:

Run:

pip install git+https://github.com/NLPatVCU/medaCy_model_clinical_notes.git

then the code snippet

import medacy_model_clinical_notes
model = medacy_model_clinical_notes.load()
model.predict("The patient was prescribed 1 capsule of Advil for 5 days.")

will output:

[
    ('Drug', 40, 45, 'Advil'),
    ('Dosage', 27, 28, '1'), 
    ('Form', 29, 36, 'capsule'), 
    ('Duration', 46, 56, 'for 5 days')
]

NOTE: If you are doing bulk prediction over many files at once, it is advisable to utilize the bulk prediction functionality.

List of medaCy pre-trained models

Application Dataset Trained Over Entities
Clinical Notes N2C2 2018 Drug, Form, Route, ADE, Reason, Frequency, Duration, Dosage, Strength
EPA Systematic Reviews TAC SRIE 2018 Species, Celline, Dosage, Group, etc.
Nanomedicine Drug Labels END Nanoparticle, Company, Adverse Reaction, Active Ingredient, Surface Coating, etc.

Sharing your medaCy models

MedaCy models can be packaged and shared with anyone (or no one!) at ease. See this example for details.

How medaCy uses spaCy

SpaCy is an open source python package built with cython that allows for lighting fast text processing. MedaCy combines spaCy's memory efficient text processing architecture with tools, ideas and principles from both machine learning and medical computational linguistics to provide a unified framework for researchers and practioners alike to advance medical text mining.