Remove stopwords from the generated word clouds #1655

maxdavidson91 · 2024-10-10T18:37:02Z

Missing functionality

Word clouds contain the most common words, and for free text fields, these words are often: 'and', 'to', 'the', 'from' etc. Which provide no meaningful insight into the data.

Proposed feature

Include an option to remove stopwords when generating word clouds. Perhaps by incorporating the nltk package to identify the list of stopwords.

Example:

from nltk.corpus import stopwords

stop = stopwords.words('english')

Alternatives considered

Removing stopwords from a pandas dataframe prior to generating the report wouldn't suffice in this case, as it would affect the 'samples' in the report

Additional context

No response

The text was updated successfully, but these errors were encountered:

fabclmnt · 2024-10-15T10:01:44Z

Hi @maxdavidson91 ,

thank you for the suggestion. Let me know If you would be interested in contributing to Ydata-profiling, by developing this feature request!

maxdavidson91 · 2024-10-16T10:57:35Z

I'd be happy to contribute

azory-ydata added the needs-triage label Oct 10, 2024

maxdavidson91 changed the title ~~Feature Request~~ Remove stopwords from the generated word clouds Oct 10, 2024

fabclmnt added feature request 💬 Requests for new features and removed needs-triage labels Oct 15, 2024

Ahmed-Jabrane linked a pull request Nov 16, 2024 that will close this issue

feat: Add option to remove default stopwords from word summary #1676

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove stopwords from the generated word clouds #1655

Remove stopwords from the generated word clouds #1655

maxdavidson91 commented Oct 10, 2024 •

edited

Loading

fabclmnt commented Oct 15, 2024

maxdavidson91 commented Oct 16, 2024

Remove stopwords from the generated word clouds #1655

Remove stopwords from the generated word clouds #1655

Comments

maxdavidson91 commented Oct 10, 2024 • edited Loading

Missing functionality

Proposed feature

Alternatives considered

Additional context

fabclmnt commented Oct 15, 2024

maxdavidson91 commented Oct 16, 2024

maxdavidson91 commented Oct 10, 2024 •

edited

Loading