Skip to content

Latest commit

 

History

History
2 lines (2 loc) · 1.17 KB

README.md

File metadata and controls

2 lines (2 loc) · 1.17 KB

Claim Verification Using Generated Data by ChatGPT and Wikipedia

Claim verification is the task of predicting a text's truthfulness. Verification models are often trained on manually created datasets which are expensive and time consuming to create. In this project, a verification model was trained on generated data. The data was generated by feeding scraped Wikipedia articles together with instructions to generate a true or false statement in a prompt to ChatGPT 3 turbo (via API). The articles were used as evidence, providing a source of truth for the model. The ChatGPT response was used as claims, either a true statement meaning it aligns with the evidence, or a false statement, meaning it contradicts the evidence. The evidence and claim were then embedded by using BERT-small. A neural network containing bidirectional LSTM layers was trained to distinguish between false and true claims. The model was able to classify the validation samples with a macro average F1-score of $0.80$. After evaluating the model using custom inputs it was found that it is sensitive to the term 'not'. The reason for this could be due to an overrepresentation of the term 'not' in the false claims.