Release 1.1 · openpaperwork/paperwork

IMPORTANT NOTE FOR WINDOWS USERS:
'paperwork_x.y.z_win64.zip' contains ONLY Paperwork itself, NOT Tesseract. Tesseract and its data files are required to use Paperwork. The list of tesseract's data files depends on which languages you intend to use.
So please do not use this .zip. Use the installer (.exe) instead.

Hello,

I'm pleased to announce the release of Paperwork 1.1. This new release is mostly focused on optimisations.

Main changes are:

Paperwork-gui 1.1:
- Windows: Activation mechanism has been disabled for now
- Workarounds for Gtk-3.20.x / GLib 2.50 (Ubuntu):
  - Work around weird behavior of GLib.idle_add (multiple calls)
  - Work around lack of refresh of document list
- Import: Display how many image files, PDFs, documents and pages have been
  imported.
- Automatic Color Equalization: Reduce the 'circle side-effect' by increasing
  the number of samples used.
- paperwork-shell scan: Quit after scanning
- Settings window: "Source" becomes "Default source" (cosmetic)
- Export: Don't lock the UI + Display the progression of the export
- Improve keyword highlighting: Highlight words identical to search keywords
  (as before) and also words close enough (example: 'flesh' when 'flesch'
  is being search)
- Optim: Document list: Only display display the first 100 elements of the
  list, and extend it only when required. Reduces GTK latency and CPU usage
  (GtkListBox doesn't scale very well above 100 elements).
- Optim: Improve PDF rendering speed: Let the libpoppler take care of the
  rendering size (see backend:page.get_image())
- Optim: Reduce the number of useless calls to Canvas.redraw()
Paperwork-backend 1.1:
- paperwork-shell: Add commands 'search', 'dump', 'switch_workdir', 'rescan',
  'show', 'import', 'delete_doc', 'guess_labels', 'add_label', 'remove_label',
  'rename'
- Add methods doc.has_ocr() and page.has_ocr() indicating if OCR has already
  been run on a given doc/page or not yet.
  Used in GUI for the option "Redo OCR on all documents" as it must act only
  on documents where OCR has already been done in the past (ie not PDF with
  text included)
- Optim: Provides a method page.get_image() returning an already resized
  Pillow image (PDF rendering optimisation)
- Export: Report progression
- Optim: PDF thumbnail rendering: Keep a cached version of the first page only.
  The other pages can be rendered on the fly
- Fix: Label directory name use base64 encoding, and this encoding can result
  in strings containing '/'. Those characters must be replaced (by '_')
- Fix: util/find_language(): If the system locale is not set properly, pycountry
  may raise UnicodeDecodeError.
- Import: When importing a single PDF, don't import it if it was already
  previously imported
- Import: Provides detailed information and statistics regarding what has been
  imported (return value of Importer.import_doc() has changed)

As usual, informations regarding Paperwork installation and update can be found at
https://github.com/jflesch/paperwork#readme .
Detailed ChangeLog for paperwork-gui is available here:
https://github.com/jflesch/paperwork/blob/stable/ChangeLog
Detailed ChangeLog for paperwork-backend is available here:
https://github.com/jflesch/paperwork-backend/blob/stable/ChangeLog

Best regards,

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1.1