AlertEmbeddings

Abuse detection in online conversations with text and graph embeddings

Copyright 2021-24 Noé Cécillon

AlertEmbeddings is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation. For source availability and license information see LICENCE

Lab site: http://lia.univ-avignon.fr/
GitHub repo: https://github.com/CompNet/Alert
Contact: Noé Cécillon [email protected]

Description

This set of scripts aims at learning various embeddings from online conversations to detect online abuse. Two main approaches are implemented: a content-based approach and a graph-based approach, which can also be used jointly. It leverages our library SWGE, which is described in [C'24, CLDA'24], as well as methods from the literature, as described in the experimental protocol from [C'24, CLDA'24]. The Alert repository implements similar functionalities, but using feature engineering instead of learned embeddings. This software is used in [CLDL'20a, C'24, CLD'24] (cf. these publications for more details).

Data

This software was applied to a corpus of chat messages from the French MMORPG SpaceOrigin, already used for Alert, and presented in [PLDL'17, PLDL'17a, PLDL'17b, PLDL'18, PLDL'19, C'19, CLDL'19]. It also requires some signed graphs extracted from this textual corpus, which are available on Zenodo.

These conversational networks are also included as a zip file in this repository: unzip the SpaceOrigin_graphs.zip archive into the in/graphs folder. Conversation should be added in the in/text_conversations folder as a separate file for each conversation with each line corresponding to a message. An example is available on this repository.

Organization

Here are the folders composing the project:

Folder in: input data, including the textual conversations and graphs.
Folder SGCN: set of scripts to learn embeddings using the SGCN method [CLD'24].
Folder signed_graph2vec: set of scripts to learn embeddings using the SG2V method [CLD'24].
Folder emb: contains all the learned embeddings.
Folder output: output files generated by the methods, such as the weights.
Folder src: set of scripts to apply the standard unsigned graph embedding models and the text embedding methods.
main.py: main script used to launch all the experiments.

Installation

This library requires Python 3.8+. Dependencies car be installed with pip install -r requirements.txt

The Graphormer library requires a separate installation and Python 3.9. It can be installed with:

git clone --recursive https://github.com/microsoft/Graphormer.git
cd Graphormer
bash install.sh

The documentation of Graphormer can be found here.

Use

The main script is the entry point to launch all the experiments. Use python main.py to run it.

References

[PLDL'17] É. Papegnies, V. Labatut, R. Dufour, and G. Linarès. Detection of abusive messages in an on-line community, 14ème Conférence en Recherche d'Information et Applications (CORIA), Marseille, FR, p.153–168, 2017. doi: 10.24348/coria.2017.16 - ⟨hal-01505017⟩
[PLDL'17a] É. Papegnies, V. Labatut, R. Dufour, and G. Linarès. Graph-based Features for Automatic Online Abuse Detection, 5th International Conference on Statistical Language and Speech Processing (SLSP), Le Mans, FR, Lecture Notes in Artificial Intelligence, 10583:70-81, 2017. doi: 10.1007/978-3-319-68456-7_6 - ⟨hal-01571639⟩
[PLDL'17b] É. Papegnies, V. Labatut, R. Dufour, and G. Linarès. Détection de messages abusifs au moyen de réseaux conversationnels, 8ème Conférence sur les modèles et lánalyse de réseaux : approches mathématiques et informatiques (MARAMI), La Rochelle, FR, 2017. ⟨hal-01614279⟩
[PLDL'18] É. Papegnies, V. Labatut, R. Dufour, and G. Linarès. Impact Of Content Features For Automatic Online Abuse Detection, 18th International Conference on Computational Linguistics and Intelligent Text Processing (CICling 2017), Budapest, HU, Lecture Notes in Computer Science, 10762:153–168, 2018. doi: 10.1007/978-3-319-77116-8_30 - ⟨hal-01505502⟩
[PLDL'19] É. Papegnies, V. Labatut, R. Dufour, and G. Linarès. Conversational Networks for Automatic Online Moderation, IEEE Transactions on Computational Social Systems, 6(1):38–55, 2019. doi: 10.1109/TCSS.2018.2887240 - ⟨hal-01999546⟩
[C'19] N. Cécillon. Exploration de caractéristiques d’embeddings de graphes pour la détection de messages abusifs, MSc Thesis, Avignon Université, Laboratoire Informatique d'Avignon (LIA), Avignon, FR, 2019. ⟨dumas-04073337⟩
[CLDL'19] N. Cécillon, V. Labatut, R. Dufour & G. Linarès. Abusive Language Detection in Online Conversations by Combining Content- and Graph-Based Features, IAAA ICWSM International Workshop on Modeling and Mining Socia-Media Driven Complex Networks (Soc2Net), Munich, DE, Frontiers in Big Data 2:8, 2019. doi: 10.3389/fdata.2019.00008 - ⟨hal-02130205⟩
[CLDL'20a] N. Cécillon, V. Labatut, R. Dufour & G. Linarès. Tuning Graph2vec with Node Labels for Abuse Detection in Online Conversations, 11ème Conférence sur les modèles et l'analyse de réseaux : approches mathématiques et informatiques (MARAMI), Montpellier, FR, 2020. Conference version - ⟨hal-02993571⟩
[CLDL'20b] N. Cécillon, V. Labatut, R. Dufour & G. Linarès. Graph Embeddings for Abusive Language Detection, Springer Nature Computer Science 2:37, 2020. doi: 10.1007/s42979-020-00413-7 - ⟨hal-03042171⟩
[CDL'21] N. Cécillon, R. Dufour & V. Labatut. Approche multimodale par plongements de texte et de graphes pour la détection de messages abusifs, Traitement Automatique des Langues 62(2):13-38, 2021. Journal version - ⟨hal-03527016⟩
[C'24] N. Cécillon. Combining Graph and Text to Model Conversations: An Application to Online Abuse Detection, PhD Thesis, Avignon Université, Laboratoire Informatique d'Avignon (LIA), Avignon, FR, 2024. ⟨tel-04441308⟩
[CLDA'24] N. Cécillon, V. Labatut, R. Dufour, N. Arınık: Whole-Graph Representation Learning For the Classification of Signed Networks, IEEE Access (in press), 2024. DOI: 10.1109/ACCESS.2024.3472474 - ⟨hal-04712854⟩
[CLD'24] N. Cécillon, R. Dufour & V. Labatut. Conversation-Based Multimodal Abuse Detection Through Text and Graph Embeddings, submitted, 2024.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AlertEmbeddings

Description

Data

Organization

Installation

Use

References

About

Releases 2

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
Graphormer		Graphormer
SGCN		SGCN
in		in
signed_graph2vec		signed_graph2vec
src		src
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

License

CompNet/AlertEmbeddings

Folders and files

Latest commit

History

Repository files navigation

AlertEmbeddings

Description

Data

Organization

Installation

Use

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 2

Languages

Packages