Software Metadata Classification Project
Team members: Xihao Zhou, Ruohan Gao, Gan Xin, Hao Yang, Yifan Li, Dongsheng Yang
https://alvinzhou66.github.io/ToolFinder/
All datasets we used for this project are in /dataset folder.
To run the scripts in the project, you can either use the requirements.txt to setup your environment locally or you can manually build the same environment on your local machine, or use our Dockerfile to build a docker container.
- Virtual environment(Need python 3.7.x).
Firstly create your virtual env and activate it
python3 -m venv your_venv_name
. ./your_venv_name/bin/activate
Then use pip to install the packages
pip install -r requirements.txt
- Local (make sure you have python 3.7.x).
Download Zip or clone my reporsitory.
[email protected]:alvinzhou66/ToolFinder.git
Move into the Repo and install the packages using the requirements.txt file.
pip install -r requirements.txt
- Docker.
Install Docker first.
In the directory which has our Dockerfile, build the docker container:
docker build -t coss .
Run it
docker run -p 5006:5006 -it coss
- Possible error while using Docker.
If you have this error ""failed to solve with frontend dockerfile.v0" (it happens to 2 machines in our team).
Please check your docker server version, make sure it is up-to-date, or try to purge your current docker server and try it again.
We do have the "requirements.txt" file in that directory, so the error should be caused by the server.
For binary classifiers, just run the 4 ipynb script in "/binary_classifier" folder.
For functional classifier, move to "/functional_classifier" and run des_fuc.ipynb first, then run func_class.ipynb.
- Functional classifier. An interactive Bokeh visualization which can handle URL inputs(any URL with description, don't need to be .md file), return function prediction result. Also, visualize our training result and compaire with SOMEF. After finishing the installation of the virtual environment or docker container (as shown in above), you can activate the virtual environment and use that for running the visualization.
. ./your_venv_name/bin/activate
You need to go to folder of our repository locally and cd into the directory of visualization, and start the bokeh server application.
cd visualization
bokeh serve --show interactive_ui.py
Then go to your localhost:5006 port to see the visualization result.
To use the functional classifier, you need to input the url into the box and click the predict button. Then the result will show in the pie chart, which contains the probabilities of your input project being different type of scientific software. The result may show after several seconds due to crawling the website and the inference of the model.
To use the binary classifiers, you can use the binder badge above, or you need to first:
cd binary_classifier
Then run "SOMEF_BIN_classifier.ipynb" to use this Jupyter Notebook to see the result.