Principles of Machine Learning for Bioinformatics

This four-day course introduces a selection of machine learning methods used in bioinformatic analyses with a focus on gene expression data. Topics covered include: unsupervised learning, dimensionality reduction and clustering; feature selection and extraction; and supervised learning methods for classification (e.g., random forests, support vector machines, knn, etc.) and regression (with an emphasis on regularization methods appropriate for high-dimensional problems). Participants have the opportunity to apply these methods as implemented in R and python to publicly available data.

Course materials

Lecture notes are provided in three different formats.

I'd recommend following along using either the R- or Python-versions of the Jupyter notebooks: to do this, you should download the entire repository into a directory on your personal computer---you will need the data files in the data directory as well as the accessory script files included in the repository in order to run the code in the notebooks. (This also means that all of the script files and notebooks should be housed in the same directory, together with the data directory; this should already be in the correct structure if you clone it directly from github).

pdf document (R version)

maclearn-2020.pdf

Jupyter notebook (R version)

For a tutorial in how to use Jupyter notebooks with R, see: https://www.datacamp.com/community/blog/jupyter-notebook-r

Jupyter notebook (python version)

For a tutorial in how to use Jupyter notebooks with Python, see: https://www.datacamp.com/community/tutorials/tutorial-jupyter-notebook

data

The directories data contains two example data sets (described in the lecture notes). The remaining files in the repository are small R or python scripts either sourced (R) or imported (Python) at various points in the jupyter notebooks.

Please note that you do not need to (and should not) unzip any of the gzipped files (which can be identified by the extension ".gz" at the end of the file name)! They can be loaded into R or Python directly as is and the notebooks are configured to access them using the gzipped forms.

Suggested prerequisites

Recommended for students with some prior knowledge of either R or python. Participants are expected to provide their own laptops with recent versions of R and/or python installed. Students will be instructed to download several free software packages (including R packages and/or python libraries such as including pandas and sklearn).

R packages

from CRAN

The command below can be run within an R session to install most of the required packages from CRAN; some of these may take a while to install, recommend installation prior to class if you intend to run the R scripts.

install.packages(c("caret", "clue", "ggplot2", "glmnet", "HiDimDA",
                   "kernlab", "pheatmap", "pROC", "randomForest",
                   "rpart", "tidyr"))

from Bioconductor

The package genefilter can be installed from Bioconductor using the following code again run within an R session.

install.packages("BiocManager")
BiocManager::install("genefilter")

Python modules

The following Python modules are used in the included scripts; again I would recommend installing prior to class if you intend to run the Python scripts:

matplotlib
mlxtend
numpy
pandas
plotnine
scikit-learn (a.k.a. sklearn)
scipy
seaborn

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
Rmd		Rmd
data		data
1-algorithms-that-learn-Python.ipynb		1-algorithms-that-learn-Python.ipynb
1-algorithms-that-learn-R.ipynb		1-algorithms-that-learn-R.ipynb
2-unsupervised-Python.ipynb		2-unsupervised-Python.ipynb
2-unsupervised-R.ipynb		2-unsupervised-R.ipynb
3-knn-and-cross-validation-Python.ipynb		3-knn-and-cross-validation-Python.ipynb
3-knn-and-cross-validation-R.ipynb		3-knn-and-cross-validation-R.ipynb
4-feature-selection-extraction-Python.ipynb		4-feature-selection-extraction-Python.ipynb
4-feature-selection-extraction-R.ipynb		4-feature-selection-extraction-R.ipynb
5-regression-models-Python.ipynb		5-regression-models-Python.ipynb
5-regression-models-R.ipynb		5-regression-models-R.ipynb
6-svm-bootstrap-random-forests-auc-Python.ipynb		6-svm-bootstrap-random-forests-auc-Python.ipynb
6-svm-bootstrap-random-forests-auc-R.ipynb		6-svm-bootstrap-random-forests-auc-R.ipynb
README.md		README.md
ggfuntile.R		ggfuntile.R
ggfuntile.py		ggfuntile.py
load_hess.R		load_hess.R
load_hess.py		load_hess.py
load_neves.R		load_neves.R
load_neves.py		load_neves.py
maclearn-2020.pdf		maclearn-2020.pdf
maclearn_utils_2020.R		maclearn_utils_2020.R
maclearn_utils_2020.py		maclearn_utils_2020.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Principles of Machine Learning for Bioinformatics

Course materials

pdf document (R version)

Jupyter notebook (R version)

Jupyter notebook (python version)

data

Suggested prerequisites

R packages

from CRAN

from Bioconductor

Python modules

About

Releases

Packages

Languages

denniscwylie/maclearn

Folders and files

Latest commit

History

Repository files navigation

Principles of Machine Learning for Bioinformatics

Course materials

pdf document (R version)

Jupyter notebook (R version)

Jupyter notebook (python version)

data

Suggested prerequisites

R packages

from CRAN

from Bioconductor

Python modules

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages