-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Show related publications #69
Comments
Does anybody think the algorithm should be something more complex (e.g., counting also authors, or assigning different values to the different tags)? |
We can put the authors and tags in a set and compute the Jaccard index (http://en.wikipedia.org/wiki/Jaccard_index) between the papers. Its also easy to implement |
If I understand it, the difference is that instead of doing: same_tags = set_of_tags.intersection(current_tag_ids)
# This could be something more advance:
# One tag that appears twice might (or might not) have a higher value
# than one that appears 10 times
value = len(same_tags) We do: intersection = set_of_tags.intersection(current_tag_ids)
union = set_of_tags.union(current_tag_ids)
value = intersection / union Is that right? |
Yes, but to take also the authors into account we can add their IDs to the set (used_set = author_ids + tag_ids) |
I'm thinking that maybe I'll implement a number of options and provide them as options with queries (e.g., publications/<publication_slug>/?related=withauthors&related_method=jaccard), and even not show related papers (only with those methods). Then, we can evaluate all this with publications and see which options are better for tuning it. |
We can do something similar to what we have done to the related persons in #78 |
It would be interesting to list 3-5 related publications, based on the tags.
A simple implementation might not be too difficult or inefficient. Basically, in 2 queries (assuming that we already have the list of tag_ids of the current publication):
The text was updated successfully, but these errors were encountered: