Create a new conda environment using (replace myenv with the name you want your environment to have):
conda create -n myenv python=3.8
When conda asks you to proceed, type y. This creates the environment in /envs/. No packages will be installed in this environment.
To install required packages, activate the environment:
conda activate myenv
And install the packages from the requirements.txt file (if you are using pip3 replace pip with pip3):
pip install -r requirements.txt
Update the CONSUMER_KEY, CONSUMER_SECRET, ACCESS_TOKEN, ACCESS_TOKEN_SECRET and BEARER_TOKEN in the file, if needed. You find a description on how to generate those in the file.
To fetch a new dataset:
- first fetch cue tweets using the
- then filter the cue tweets using the
- then fetch the corresponding sarcastic, elicit and oblivious tweets using the file in which the filtered cue tweets are saved and the
- then fetch random non-sarcastic tweets using
- to fetch elicit and oblivious tweets corresponding to the non-sarcastic tweets use
- to fetch the user history for the sarcastic and non-sarcastic users use, once for the saracstic and once for the non-sarcastic users. Afterwards use to create a dictionary for sarcastic and non-sarcastic users. To create a dictionary with the combined user history as a dictionary and a sample for the users use If you want to limit the maximum of tweets per user use the corresponding code in (inserted as a comment).
You can use the BERT-sentence transformer either only with textual features (using or with addtional conversational features (adding eliciting and oblivious tweets - using
If you want to run the models for the sarcastic vs. non-sarcastic class, use the csv containing the sarcastic and non-sarcastic tweets and specify class=all. If you want to run it for perceived vs. intended, only use the csv containing sarcastic tweets only and specify class=sarcastic.
Using 3 different models, which take user context in addition to textual information as input.
- Model using priming: 200 tokens from the user history of each users are added as pre-fix to the tweet text
- Model using average user embeddings: adding a user embedding based on the average embedding of the historical tweets of each user, using sentence transformers
- Model using user attribution: adding a user attribution based on the historical tweets using a linear model
- GNN: modeling the social relations between users, and the relations between tweets and users. For this purpose,a heterogeneous graph G = (V, E) is build, where V = {U ∪ T}, which consists of two types of nodes: users and tweets.
To use the models described in 1. - 3. run and specify the run configuration accordingly (see examplary calls in the file). To use the model described in 4.: TBD
Models described in 2. & 3. can be additionally enhanced with conversational features as described in "Run text-only models", by appending the eliditing and oblivous tweets to the sarcastic tweet.
- create a user vocabulary using
- Create user embeddings using the historical tweets of each author in the sarcastic and non-sarcastic dataset with
- Create text embeddings for tweets in the sarcastic and non-sarcastic dataset and for the user history using
- Train the linear model for user attribution using with the text embeddings of the sarcastic and non-sarcastic dataset (only using the training set)
- Extract the user embeddings using with the text embeddings of the historical tweets for each user
If you want to run it for the sarcastic vs. non-sarcastic class, use the csv containing the sarcastic and non-sarcastic tweets and specify class=all. If you want to run it for perceived vs. intended, only use the sarcastic csv and specify class=sarcastic.