General PDF text extraction and cleanup #34

swotai · 2021-08-01T01:46:05Z

This ticket is to start working on some organized way to read files from GDrive, PDF text extract, and data cleaning, to be fed to the keyword extractor/summarization pipelines.

notebook has code for keyword extract.

TODOs:

Refactor the part for reading google drive files into a separate script/function
PDF Extraction
Cleanup and improve on text extraction

swotai added the data science Agender Scraper data science and nlp related issues label Aug 1, 2021

swotai mentioned this issue Aug 1, 2021

Agenda item and attachment link extraction from pdf text #35

Closed

2 tasks

xconnieex assigned xconnieex and unassigned xconnieex Aug 10, 2021

swotai changed the title ~~Agenda PDF text extraction and cleanup~~ General PDF text extraction and cleanup Aug 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

General PDF text extraction and cleanup #34

General PDF text extraction and cleanup #34

swotai commented Aug 1, 2021 •

edited

Loading

General PDF text extraction and cleanup #34

General PDF text extraction and cleanup #34

Comments

swotai commented Aug 1, 2021 • edited Loading

swotai commented Aug 1, 2021 •

edited

Loading