Tagman Python scripts are a set of scripts to automate the repository downloading and splitting process.
The scripts use the GitHub GraphAPI and QScored API to download and rate the repositories. As such, the following are needed to run the scripts:
- GitHub Personal Access Token
- QScored API key
- DesigniteJava
- CodeSplitJava
The scripts can be configured to download scripts that match certain criteria. We have used the following criteria:
- Lines of code: 10000 or more
- QScored quality score threshold: 10
- Language: Java
- Number of stars: 40,000 or more
These can be easily configured in each file with their corresponding constants.
- Clone the repository
git clone https://github.com/SMART-Dal/Tagman-python-scripts.git
- Run the script
python download.py
python download-repo.py
python data_curation_main.py