Please be prepared with
- basic knowledge of Python
- experience in using Jupyter notebooks
During the course we will use little bit of Pandas (10 minute intro) and scikit-learn to build simple machine learning models.
Get the docker image: docker pull oroszgy/hungarian-text-mining-workshop
Start Jupyter Notebook: make start
- Make sure you have Python 3.5+ installed (preferably a conda distribution)
- Clone this repository:
git clone http://github.com/oroszgy/hungarian-text-mining-workshop && cd hungarian-text-mining-workshop
- Install the necessary packages:
pip install -r requirements.txt
- Download the Enlgish and the Hungaruan NLP models for spaCy:
python -m spacy download en
pip install https://github.com/oroszgy/spacy-hungarian-models/releases/download/hu_tagger_web_md-0.1.0/hu_tagger_web_md-0.1.0.tar.gz
- Install HuNlpy
pip install https://github.com/oroszgy/hunlp/releases/download/0.2/hunlp-0.2.0.tar.gz
Start Jupyter Notebook: jupyter notebook
- Practical NLP in Python:
spaCy
andtextacy
, Describing documents with words - Document categorization, Sentiment analysis
- Extracting named entities and concepts
(c) Gyorgy Orosz, 2017