Project requires git
, python >= 3.6
with pip
and virtualenv
(optionally virtualenvwrapper
).
- Install Python 3.6
- Install pip
- Install virtualenv (virtualenvwrapper optionally)
- System libraries (as support for Python libraries)
Clone repository:
git clone https://github.com/ivanazeljkovic/question_search_engine.git
cd question_search_engine/
Create virtual environment with:
virtualenv -p python3.6 venv
or if you are using virtualenvwrapper
instead of virtualenv
:
mkvirtualenv -p python3.6 venv
Install requirements with activated virtual environment:
pip install -r requirements.txt
Inside root directory create directory data and its nested directory raw. On path /data/raw store questions corpus file with name questions.json. The structure of corpus file should be the same as shown in the example below:
{"id": 1, "question": "what is TF-IDF?", "tags": "<nlp>"}
{"id": 2, "question": "should I ignore poentry.lock?", "tags": "<python>"}
{"id": 3, "question": "How to use pytest?", "tags": "<python><pytest>"}
From root directory run:
python run.py
Wait for processes of corpus loading and fitting into TF-IDF vectorizer to be done. When an interactive prompt is open, input a question of interest:
>>> Error handling in Java?
The structure of output should be the same as shown in the example below:
0.8318 43953635 How do I use Error handling in Java
0.7683 38835571 Error Handling in Swift 3
0.6029 47684377 Java BufferedReader error
0.5649 38936305 If block error handling in bash
0.5519 52513360 java ATM program simulation with exception handling - no error neither full output.
From root directory run command for running a particular group of Unit tests:
python -m tests.test_preprocessor
python -m tests.test_question_search_engine
python -m tests.test_tf_idf_vectorizer
python -m tests.test_utils
From root directory run commands for running shell scipt:
chmod +x tests/run_all_tests.sh
tests/run_all_tests.sh