-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster language detection #2
Comments
Just to give you a reference, as test i detected the language of about 4000 docs, with average 100 words:
which equals 55 minutes. Which makes this package completely useless for my usecase. Which is quite sad, because spacy is an awesome lib! |
So today i digged in your lib and found the detector_factory and detect functions, which can be imported with:
Which than can be directly accessed with:
New time is: 86s PS: No need for spaCy at all. This works because in your spacy_langdetect.py you do: "from langdetect import detect_langs" which can be directly imported as shown. Which leads to the question, why bother importing spaCy and doing all the unnecessary steps for a simple language detection like this? |
Thanks for the approach, really improves the performance |
Thanks for that great answer! |
we are currently fasttext which is performing quite well. however since we are already using spacy models (one for english and one for german) in other parts of the app, I figured it would be interesting to use the spacy models for language detection as well. but I am also a bit confused about how it works since it seems like it only uses one language model at a time. and now it seems to indicate that this solution here is just the integration of langdetect in spacy and not a spacy based language detection. we used langdetect in the past already and found it not accurate enough compared to fasttext. |
Hi there,
currently the spaCy-language detection takes quite a while, because its doing tokenisation and sentence splitting and what not in the background.
I just want to have the language for the doc, can i somehow improve the speed of spacy-languagedetect?
regards!
The text was updated successfully, but these errors were encountered: