-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug of Gerapy Auto Extractor about similarity2 #24
Comments
我也发现这个问题了 |
from gerapy_auto_extractor.extractors.base import BaseExtractor class TitleExtractor(BaseExtractor):
title_extractor = TitleExtractor() def extract_title(html): #把title 替换成我这个就解决了 |
def similarity2(s1, s2):
"""
get similarity of two strings
:param s1:
:param s2:
:return:
"""
if not s1 or not s2:
return 0
s1_set = set(list(s1))
s2_set = set(list(s2))
intersection = s1_set.intersection(s2_set)
union = s1_set.intersection(s2_set)
return len(intersection) / len(union)
The text was updated successfully, but these errors were encountered: