This directory contains common workflows for using medaCy
- How medaCy Works
- Building a medaCy Pipeline
- Pre-trained Models
- Distributing Trained Models
- Interaction with spaCy
MedaCy leverages the text-processing power of spaCy with state-of-the-art research tools and techniques in medical text mining. MedaCy consists of a set of lightning-fast pipelines that are specialized for learning specific types of medical entities and relations. A pipeline consists of a stackable and interchangeable set of PipelineComponents - these are bite-sized code blocks that each overlay a feature onto the text being processed.
PipelineComponents can be developed to utilize in custom Pipelines by interfacing the BaseOverlayer and BasePipeline classes respectively. Alternatively use components already implemented in medaCy. Some more powerful components require outside software - an example is the MetaMapOverlayer which interfaces with MetaMap to overlay rich medical concept information onto text. Components are chained or stacked in pipelines and can themselves depend on the outputs of previous components to function. In the underlying implementation, a medaCy PipelineComponent is a wrapper over a spaCy component that includes a number of utilities specific to faciliting the training, utilization, and distribution process of medical domain text processing models.
To run a medaCy pre-trained model over your own data, simply install the package associated with the model by following the links below. Models officially supported by medacy all start with the prefix medacy_model. For example, assuming you have medaCy installed:
Run:
pip install git+https://github.com/NLPatVCU/medaCy_model_clinical_notes.git
then the code snippet
import medacy_model_clinical_notes
model = medacy_model_clinical_notes.load()
model.predict("The patient was prescribed 1 capsule of Advil for 5 days.")
will output:
[
('Drug', 40, 45, 'Advil'),
('Dosage', 27, 28, '1'),
('Form', 29, 36, 'capsule'),
('Duration', 46, 56, 'for 5 days')
]
NOTE: If you are doing bulk prediction over many files at once, it is advisable to utilize the bulk prediction functionality.
Application | Dataset Trained Over | Entities |
---|---|---|
Clinical Notes | N2C2 2018 | Drug, Form, Route, ADE, Reason, Frequency, Duration, Dosage, Strength |
EPA Systematic Reviews | TAC SRIE 2018 | Species, Celline, Dosage, Group, etc. |
Nanomedicine Drug Labels | END | Nanoparticle, Company, Adverse Reaction, Active Ingredient, Surface Coating, etc. |
MedaCy models can be packaged and shared with anyone (or no one!) at ease. See this example for details.
SpaCy is an open source python package built with cython that allows for lighting fast text processing. MedaCy combines spaCy's memory efficient text processing architecture with tools, ideas and principles from both machine learning and medical computational linguistics to provide a unified framework for researchers and practioners alike to advance medical text mining.