ConceptNet 5.7 (Russian part) extraction scripts + fast API object to access the relations. Note: a simple modification of the preprocessing script allows to build a queryable graph of any other subset of ConceptNet.
pip install ruconceptnet
>>> from ruconceptnet import ConceptNet
>>> cn = ConceptNet()
>>> cn.get_targets("алкоголь")
[('этиловый_спирт', {'Synonym'}), ('спиртной_напиток', {'Synonym'}), ('алкогольный', {'RelatedTo'}),
('алкоголик', {'RelatedTo'}), ('спирт', {'Synonym'}), ('алкоголизация', {'RelatedTo'})]
>>> cn.get_sources("йога")
[('йоги', {'FormOf'}), ('йогу', {'FormOf'}), ('йогический', {'RelatedTo'}), ('йогою', {'FormOf'}),
('йогой', {'FormOf'}), ('йог', {'RelatedTo'}), ('йоге', {'FormOf'})]
>>> cn.check_pair("человек", "зверь")
(['DistinctFrom'], [])
>>> cn.check_pair("зверь", "человек")
([], ['DistinctFrom'])
Please see the prepare_data.sh
script. We get the Russian-Russian pairs of nodes with simple grep
and build
a 3-dimensional array (source, target, relation) stored as a single sparse SciPy matrix.
Please do not forget to cite the ConceptNet5 paper.
@inproceedings{10.5555/3298023.3298212,
author = {Speer, Robyn and Chin, Joshua and Havasi, Catherine},
title = {ConceptNet 5.5: An Open Multilingual Graph of General Knowledge},
year = {2017},
publisher = {AAAI Press},
booktitle = {Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence},
pages = {4444–4451},
numpages = {8},
location = {San Francisco, California, USA},
series = {AAAI'17}
}
Citing the repository is not necessary, but greatly appreciated as well, if you use this work.
@misc{ruconceptnet2020alekseev,
title = {{alexeyev/RuConceptNet: /ru/ConceptNet5.7 Python wrapper }},
year = {2020},
url = {https://github.com/alexeyev/RuConceptNet},
language = {english}
}
The code is released under the MIT license (please see the LICENSE
file).
This work includes a subset data from ConceptNet 5, which was compiled by the Commonsense Computing Initiative. ConceptNet 5 is freely available under the Creative Commons Attribution-ShareAlike license (CC BY SA 3.0) from http://conceptnet.io.
The included data was created by contributors to Commonsense Computing projects, contributors to Wikimedia projects, DBPedia, OpenCyc, Games with a Purpose, Princeton University's WordNet, Francis Bond's Open Multilingual WordNet, and Jim Breen's JMDict.
The complete data in ConceptNet is available under the Creative Commons Attribution-ShareAlike 4.0 license.
For more details, please see "Copying and sharing ConceptNet".