Festos is an infrastructure for scanning, OCR'ing and make searchable texts.
Install all the packages (the next line has been tried only in Ubuntu 12.04 64b and 12.10 64b):
sudo apt-get install rabbitmq-server rubygems graphicsmagick poppler-utils pdftk ghostscript tesseract-ocr tesseract-ocr-eng tesseract-ocr-spa-old tesseract-ocr-spa yui-compressor git python-pip python-dev build-essential npm openjdk-7-jre -y
Note: To install rubygems on Ubuntu 14.04:
sudo apt-get install rubygems-integration
You need to install docsplit. Then the docsplit:
Install:
sudo gem install docsplit
Try it:
docsplit
This is part of the django-docviewer configuration:
sudo ln -s /usr/local/bin/docsplit /usr/bin/docsplit sudo ln -s /usr/bin/yui-compressor /usr/local/bin/yuicompressor
Install yuglify:
npm config set registry http://registry.npmjs.org/ npm -g install yuglify
Install the elasticsearch:
cd ~ wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.20.5.deb sudo dpkg -i elasticsearch-0.20.5.deb
Install the virtual environment:
Install the packages:
sudo pip install --upgrade pip sudo pip install --upgrade virtualenv sudo pip install virtualenvwrapper
Create your .venv directory:
mkdir -p ~/.venvs
You need to configure the environment:
export WORKON_HOME=~/.venvs source /usr/local/bin/virtualenvwrapper.sh
Add the lines to your .bashrc file so the next time your environment is ready:
Open the .bashrc:
pico .bashrc
Copy and paste the next lines to the end of the .bashr file:
export WORKON_HOME=~/.venvs source /usr/local/bin/virtualenvwrapper.sh
Create a virtualenv for the project:
mkvirtualenv festos --no-site-packages
Try it:
workon festos deactivate
Install the project:
Download it:
cd $HOME git clone https://github.com/CulturePlex/festos.git git/festos
Enter in the new location and update the virtual environment previously created:
cd git/festos/
Set the .gitignore_global to ignore unnecessary files and extensions:
git config --global core.excludesfile .gitignore_global
Install the requierements of the project:
workon festos pip install -U -r requirements.txt
Create database and launch:
You must create a database, user and configure the site. If your are in developing stage, you can use the start_all.sh script:
./start-all.sh
If you want to launch your site again, just use the following one:
python manage.py runserver localhost:8000
Access the site in the URL http://localhost:8000/
Go to the following address (login with user "festos" and password "festos" or if you didn't use the ./start-all.sh then use the one you created):
localhost:8000/admin/sites/site/1/
Check the domain name is correct ("localhost:8000" if you are developing). Change it to whatever you need. You will need to restart the server to reflex the changes:
python manage.py runserver localhost:8000
In another terminal run the celery service:
python manage.py celery worker
Add a scanned pdf (for convenience, there is one in ~/git/festos/test.pdf) document in the admin interface:
localhost:8000/admin/document/
You will need to wait a few seconds while docsplit splits the document and elasticsearch index it. You can see the status in the admin interface. When the status is 'ready', you can search in the following URL (make sure you search with an appropiate term that is insider your pdf):
localhost:8000/search/
You can also try accessing the document directly:
access the document : http://localhost:8000/viewer/1/demo.html
Open the elasticsearch.yml:
$ sudo nano /etc/elasticsearch/elasticsearch.yml
Add the following to the configuration file (in the Index section):
index: analysis: analyzer: # set standard analyzer with no stop words as the default for both indexing and searching default: type: standard stopwords: _none_
Delete the haystack index (Warning, this is going to delete all the index):
curl -XDELETE 'http://localhost:9200/haystack/'
Restart the elasticsearch service:
sudo service elasticsearch restart
Install and configure Postgresql Database:
Install Postgresql:
sudo apt-get install postgresql
Set the password:
sudo passwd postgres
Create a django user named "festos":
sudo -u postgres createuser -P festos
Switch user:
su postgres
Enter the Postgres shell:
psql template1
Create db and owner:
CREATE DATABASE festos_db OWNER festos ENCODING 'UTF8';
Quit the shell:
\q
Edit the Postgres permissions:
nano /etc/postgresql/9.1/main/pg_hba.conf
Adding the following line:
local django_db django_login md5
Leave user postgresl, go back to your user account:
exit
Restart the server:
sudo service postgresql restart
Configure the environment:
Install the system libraries:
sudo apt-get build-dep python-psycopg2
Activate your virtual environment:
workon festos
Install the python library inside the virtual environment:
pip install psycopg2
Open the the production settings:
nano festos/prod_settings.py
Add the configuration:
DATABASES = { 'default': { 'ENGINE': 'django.db.backends.postgresql_psycopg2', 'NAME': 'festos', 'USER': 'festos', 'PASSWORD': 'FESTOS_PASSWORD', 'HOST': '', 'PORT': '', } }
Set the variable:
export DJANGO_SETTINGS_MODULE=festos.prod_settings
Run the start_all.sh script:
./start_all.sh
Restart your servers