Skip to content

Biomedical Search Engine, Big Data Analytics (EECS E6893) Final Project, Columbia University, New York, NY

Notifications You must be signed in to change notification settings

Sapphirine/biomedical_search_engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Biomedical Search Engine

This search engine employs several big data concepts to make the Unified Medical Language System (UMLS) Knowledge Source accessible to any user. The search engine features three main components: query handling, classification, and visualization. The user can search a medical term with our system to retrieve a classification of the term determined by Mahout Naive Bayes, relevant information (definition, symptoms, etc.), and a visualization of neighboring/related medical concepts using a combination of IBM System G’s graph storage and a Plotly graph. The programming languages used to make this possible are PHP, HTML, Java, and Python. This repository contains all the require files to deploy this search engine in your own environment.

Dependencies

  1. Relational Database Manager System (RDBMS), we recommend MySQL Server.
  2. Web Server, we recommend Apache.
  3. Java JDK 8.
  4. PHP.
  5. IBM System G.
  6. Python.
  7. Python Packages: python-igraph, json, and plotly.

Steps to deploy the system.

  1. Sign up for license at the UMLS Terminology Services.
  2. Create the database named umls in your RDBMS that will host the UMLS Schemas. For instance CREATE DATABASE IF NOT EXISTS umls CHARACTER SET utf8 COLLATE utf8_unicode_ci for MySQL.
  3. Read the UMLS Tutorial and UMLS Reference Manual to get familiar with the system requirements and be able to access and load, to the umls database created in step 2, the Metathesaurus and Semantic Network Knowledge Sources.
  4. Run the file named Normalize_UMLS.sql in the MySQL directory. This will create a database named sandbox that normalized and subset the umls database improving performance.
  5. Read the IBM System G gShell overview.
  6. Replace the line [file_location] in the file contained in the SYSTEMG directory with the location of the concept.txt and relationship.txt created with MySQL queries. Pass the modified file to gShell (gShell interactive < filename) to load the concepts, semantics and their relationships into System G.
  7. In the PHP directory edit the following files to configure your database credentials: mysqlconnect_umls.php and mysqlconnect_sandbox.php
  8. Create an account in plot.ly and modify the file contained on the PYTHON directory to enter the username and key of your account on the following line py.sign_in('user', 'key').
  9. Copy the content of the of the PHP, JAVA and PYTHON to the sudirectory of the root directory of your web server where you want the system to be access.
  10. Go to this subdirectory in your browsers and add at the end of it "/lookup.php" and you should be able to start using the our system.
  11. The behavior of the classifier can be change by modifying the file on the JAVA/src directory but in order to do this you will need to clone the Hadoop and Mahout repositories.

Note: Please make sure that apache have read and write privileges to the location were you installed system G. If you encounter any other problems and can't figure it a solution, please feel free to contact [email protected] to assist you with MySQL, SystemG and Classifier, [email protected] for visualization or [email protected] for PHP related issues.

Project by Jose Alvarado-Guzman (jaa2220), Josh Jacobson (jj2807), and Mohammad Zaryab (mz2517).

About

Biomedical Search Engine, Big Data Analytics (EECS E6893) Final Project, Columbia University, New York, NY

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published