In this learning project I will share my progress, as well as all the resources I will use to learn about the fascinating intersection of AI and Biology. I believe that AI applied to Biotechnologies is a relatively new field. As such, grounbreaking discoveries have yet to be reached. With the increasing lifespan of the human population, solutions to pressing challenges like diseases (e.g. cancer, Alzheimer's) or global warming are extremely important. AI is already helping tackling this problems aiding drug discovery, enzyme engineering to break man-made molecules and other challenging procedures. The opportunity to contribute to this fascinating challenges, together with my lifelong passion, motivated me to start this journey to learn about the applications of AI to Biology.
I realise it is not worth to make a detailed plan to the finest details of the entire project in advance so the structure of the repository might change with time. At the moment, each folder in the repository will be divided into Chapters and within each Chapter, I will do my best to follow a logical learning sequence, making sure to build cronogically on previous knowledge. Each Chapter/folder will be divided in subfolders. Each subfolder will contain the material (notes, resources like links or papers, code, insights gained, etc) of about one week of learning.
I already have some intermediate knowledge of both AI and Biology. As such, the learning project is supposed to cover advanced topics. However, I will start from a review of the basics of both AI and Biology such that I can share insights and resources for beginners.
This is a preliminary (very large) list of broad topics divided in three main pillars: Biology, AI, AI for Biology. I won't cover them sequentially (meaning that I won't finish the Biology pillar before moving to the AI pillar) but I will follow a logic path covering (or reviewing) the fundamentals first. Apart from the fundamentals first, I don't have a scheduled order to follow the other macro-topics. If you have preferences, feel free to let me know (see below how to contribute to the project or to get in touch with me).
- Biochemistry
- Genetics
- Molecular Biology
- Computational Biology
- Biotechnologies
- Fundamentals (basic algorithms, gradient descent, computational graphs and deep neural networks, recurrent and convolutional networks)
- NLP, Transformers and LLMs
- Geometric Deep Learning
- Knowledge Graph Embedding
- Reinforcement Learning
- Protein Language Models (PLMs)
- ML-guided directed evolution for protein engineering
- Multi-omics data analysis
- Knowledge Graph Embeddings model for drug discovery
I would love for this project to be collaborative. As such, feel free to open a issue here on GitHub to discuss any topics or highlights any mistakes or things that should be done differently. Also, feel free to open a pull request to add useful resources or contributes to the learning project.
If you would like to stay updated about further developments (for example, I'm planning to write blog posts after finishing macro-topics, YouTube videos explaining LLM architectures etc.) or to get in touch with me for any other reason feel free to email me at sicurellaemanuele[at]gmail.com.
Thank you and happy learning!
Emanuele
Images generated with Midjourney