Skip to content

Latest commit

 

History

History
52 lines (45 loc) · 1.87 KB

File metadata and controls

52 lines (45 loc) · 1.87 KB

Classification-and-Analysis-of-GAP-Files-on-GitHub-using-AI

This repository comprises code for identifying repositories that uses GAP programming language, using a combination of machine learning, NLP and deep learning techniques. Additionally, it facilitates conducting comprehensive analysis on the collected data, enabling insightful observations.

overview

Functionalities Implemented

  • Retrieve repostiories raw file link from GitHub
  • Preprocess the data for carrying out various techniques on them
  • Come up with an ML/DL approach to do distinguish whether a file belongs to GAP programming language
  • Experiment with different models using ML, NLP and Deep Learning techniques
  • Compare the performance of the models
  • Perform insightful analysis on the filtered reprostories

Getting Started

We kept the scripts sepeartely from the notebooks and have given different requirements.txt to each

Running Python Scripts

Install requirements.txt

pip install -r requirements.txt

cd into scripts/ directory from the root directory

cd scripts/

Run the python script you want

python *script_name.py*

Running Jupter Notebooks (ML/DL model and Analysis Part)

cd into notebooks/ directory from the root directory

cd notebooks/

Install requirements.txt

pip install -r requirements.txt

Start the Jupter notebook

jupyter notebook

Select the notebook you want to run

Run the notebook

Best Practices

  • Check the issues section to find what to work on
  • If new ideas come up, add it to the issues section as enhancement
  • If any bugs are found, raise an issue
  • When working on something, generate a new branch with an appropriate name and then do PRs once finished