Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Word Summary #5

Open
jbranchaud opened this issue Apr 14, 2014 · 0 comments
Open

Word Summary #5

jbranchaud opened this issue Apr 14, 2014 · 0 comments

Comments

@jbranchaud
Copy link
Member

  • Project Name: Word Summary
  • Base Description: Generate a summary of the words that appear in a given string including the total word count and the number of occurrences of each word.
  • Extra Description: The summary should give the total word count as well as a word occurrence summary that ties each word to the number of times that word occurs. The summary should be output to a file in a standardized format such as JSON, YAML, etc. You will have to consider how to deal with punctuation and other special characters. A hyphenated word is a single word, but a dash (-) in another context may need to be removed while computing the summary. What about contractions and quoted areas of text? You will also need to choose how to handle case, whether you want your summary to be case-sensitive or -insensitive.
  • Sample Input/Output:
    // input
    "This string is the summarized string."
    // output
    summary: {
      total: 6,
      words: [
        "This": 1,
        "string": 2,
        "is": 1,
        "the": 1,
        "summarized": 1
      ]
    }
  • Extensions:
    • Produce a word summary of a file.
    • Enhance your program so that it can give a proper word count for a markdown file.
    • Programmatically compare your output with the output of a word count feature of an NLP library.
  • Categories: Text,Words,NLP
  • Resources:
  • Sources:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant