Evaluating semantic similarity methods for the comparison of text derived phenotypes and clinical outcome prediction

Here we present an adaptable pipeline for the derivation and comparison of patient phenotypes direct from clinical text, in the form of discharge summaries, test results and clinical notes from the MIMIC-III database. Using semantic similarity of phenotypes, we demonstrate the potential for text-derived phenotype profiles in making diagnosis predictions.

Requirements

Python and Java

These notebooks require Python 3 (developed in Python 3.8.6) available here, and Jupyter notebook (install instructions here. To run Komenti and SML via these notebooks, Java is required.

Semantic Measures Library (SML)

The SML toolkit (.jar) should be available in the working directory, available through their website.

MIMIC-III

To access MIMIC data files, an ethics course must be undertaken. Files can then be downloaded from their webiste, as detailed in the 'Install and setup' notebook.

Notebooks

Install and setup Relevant downloads, patient sampling and text annotation. Recreate our sample of 1000 patients from MIMIC or create a new sample, extract the text for each patient and apply Komenti to annotate phenotypes.
Annotation preprocessing Extract phenotypes from the annotation and build patient phenotype profiles, in preparation for using the Semantic Measures Library.
GenerateXML/Generate XML configuration Produce custom XML files comprising all available similarity measures through SML (listed in 'measures_revised'), split into IC-based, non-IC based and direct groupwise measures.
Similarity with SML Guidance for running SML in the command line to compare patient phenotypes.
Results performance Evaluation of the similarity measures for predicting primary diagnosis, producing metrics files that can be used for plotting purposes.
Results figures Plot ROC curves for individual measures, and histograms of the evaluation metrics across all similarity measures.

A version of all notebooks is available in 'Notebooks_with_output' with our outputs visible for reference.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.ipynb_checkpoints		.ipynb_checkpoints
GenerateXML		GenerateXML
Notebooks_with_output		Notebooks_with_output
Annotation Preprocessing.ipynb		Annotation Preprocessing.ipynb
Install and Setup.ipynb		Install and Setup.ipynb
README.md		README.md
Results Figures.ipynb		Results Figures.ipynb
Results Performance.ipynb		Results Performance.ipynb
Similarity with SML.ipynb		Similarity with SML.ipynb
row_ids.txt		row_ids.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Evaluating semantic similarity methods for the comparison of text derived phenotypes and clinical outcome prediction

Requirements

Python and Java

Semantic Measures Library (SML)

MIMIC-III

Notebooks

About

Releases

Packages

Contributors 2

Languages

reality/mimpred

Folders and files

Latest commit

History

Repository files navigation

Evaluating semantic similarity methods for the comparison of text derived phenotypes and clinical outcome prediction

Requirements

Python and Java

Semantic Measures Library (SML)

MIMIC-III

Notebooks

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages