Tugas Besar Algeo II
is a simple seach engine. Using cosine similarity, this project can find you similar document. It's also provided to add new documents from local files, link (web scrapping), and manual input.
A Search Engine using TF and IDF with Cosine Similarity.
To run this on your pc, you need:
- Python 3.7.X (Recommended 3.7.9)
- NPM and Node.js
- Flask
- Textract - Extract Content from All Document Extension
- NLTK - Stemming & Stopword for English
- Sastrawi - Stemming & Stopword for Bahasa Indonesia
- React
- @material-ui
First, you need to install pip :
- Download "get-pip.py" file on pypa.io
- Run cmd as Administrator
- Type in the following command :
$ python get-pip.py
Second, you need to install the dependencies. All the required dependencies can be found in the folder server by the name "requirements.txt" using $ pip -r requirements.txt
Third, you need NPM and node to start the development server for the client side/front end.
Fourth, you need to install the dependencies to start the client server. The dependencies are as following :
- Go to client directories, @material-ui/core, which can be installed by going to the directory of this project on cmd ran as administrator and by typing in
$ npm install @material-ui/core
- @material-ui/icons, which can be installed by going to the directory of this project on cmd ran as administrator and by typing in
$ npm install @material-ui/icons
- axios, which can be installed by going to the directory of this project on cmd ran as administrator and by typing in
$ npm install axios
- react-router-dom, which can be installed by going to the directory of this project on cmd ran as administrator and by typing in
$ npm install --save react-router-dom
- Run
$ npm run start
- Go to server directories and run backend server using
$ py app.py
List of features ready and TODOs for future development
- Ready: Web scraping by posting an url to the server
- Ready: Search documents (pptx, docx, doc, pdf, txt, html, json) by getting a keyword to the server
- Ready: Bahasa Indonesia and English supported
- TODO: Change CSV database to SQL database such as Postgresql or mysql
- TODO: Improve code efficiency
- TODO: Improve frontend UI/UX
Project inspired by google, based on cosine similarity theory.