Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Further build out data analysis steps #37

Open
xconnieex opened this issue Aug 10, 2021 · 1 comment
Open

Further build out data analysis steps #37

xconnieex opened this issue Aug 10, 2021 · 1 comment
Labels
data science Agender Scraper data science and nlp related issues enhancement New feature or request help wanted Extra attention is needed

Comments

@xconnieex
Copy link
Collaborator

xconnieex commented Aug 10, 2021

Currently doing tf-idf.

I have previous code in the Text-analysis folder on Github as well as some code based on Anju's colab code that does some text summarization and text modeling, but needs refinement. A dependency is how we read/cleanup the initial text from the PDF.

@xconnieex xconnieex added enhancement New feature or request help wanted Extra attention is needed data science Agender Scraper data science and nlp related issues labels Aug 10, 2021
@swotai
Copy link
Collaborator

swotai commented Aug 20, 2021

If we try to clarify what we are aiming to do (Anju's term: what's the "ask"):

Given a PDF file of memorandum/addendum/decision, We want to summarize into the following (think of this as the additional data columns that we can add to Legistar table for each agenda item #)

  • Filename (given, no need to extract from PDF)
  • Keywords (comma separated)
  • Other items:
    • e.g. referred address, related organization/government departments, others?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data science Agender Scraper data science and nlp related issues enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants