Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recommendation to edit for calculating context diversity? #11

Open
cpdavis opened this issue Sep 11, 2023 · 1 comment
Open

Recommendation to edit for calculating context diversity? #11

cpdavis opened this issue Sep 11, 2023 · 1 comment

Comments

@cpdavis
Copy link

cpdavis commented Sep 11, 2023

Hi there – I'm interested in modifying the script to calculate the number of different documents in which words appear (e.g., how many wikipedia articles does the word "DOG" appear in). Have you considered this, or do you have a recommended approach to modifying the script for this purpose? Wanted to check in before attempting the changes myself. Appreciate your consideration.

@IlyaSemenov
Copy link
Owner

I have not considered this, and honestly I don't see much value in these numbers (other than its curious to see). However, I realize it could be valuable to some, and it's a good fit for this mini project.

I would recommend to change the file format to tab separated values, with now 3 columns: word, number of uses, number of different articles. (I am not sure why I didn't do that originally.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants