Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

General PDF text extraction and cleanup #34

Open
3 tasks
swotai opened this issue Aug 1, 2021 · 0 comments
Open
3 tasks

General PDF text extraction and cleanup #34

swotai opened this issue Aug 1, 2021 · 0 comments
Labels
data science Agender Scraper data science and nlp related issues

Comments

@swotai
Copy link
Collaborator

swotai commented Aug 1, 2021

This ticket is to start working on some organized way to read files from GDrive, PDF text extract, and data cleaning, to be fed to the keyword extractor/summarization pipelines.

notebook has code for keyword extract.

TODOs:

  • Refactor the part for reading google drive files into a separate script/function
  • PDF Extraction
  • Cleanup and improve on text extraction
@swotai swotai added the data science Agender Scraper data science and nlp related issues label Aug 1, 2021
@xconnieex xconnieex assigned xconnieex and unassigned xconnieex Aug 10, 2021
@swotai swotai changed the title Agenda PDF text extraction and cleanup General PDF text extraction and cleanup Aug 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data science Agender Scraper data science and nlp related issues
Projects
None yet
Development

No branches or pull requests

2 participants