Improving Arabic cognitive distortion classification in Twitter using BERTopic

Alhaj, Fatima, Al-Haj, Ali, Sharieh, Ahmad and Jabri, Riad (2022) Improving Arabic cognitive distortion classification in Twitter using BERTopic. International Journal of Advanced Computer Science and Applications, 13 (1). pp. 854-860. ISSN 2158-107X

[img]
Preview
Text
Paper_99-Improving_Arabic_Cognitive_Distortion_Classification_in_Twitter.pdf - Published Version
Available under License Creative Commons Attribution.

Download (377kB) | Preview

Abstract

Social media platforms allow users to share thoughts, experiences, and beliefs. These platforms represent a rich resource for natural language processing techniques to make inferences in the context of cognitive psychology. Some inaccurate and biased thinking patterns are defined as cognitive distortions. Detecting these distortions helps users restructure how to perceive thoughts in a healthier way. This paper proposed a machine learning-based approach to improve cognitive distortions’ classi-fication of the Arabic content over Twitter. One of the challenges that face this task is the text shortness, which results in a sparsity of co-occurrence patterns and a lack of context information (semantic features). The proposed approach enriches text rep-resentation by defining the latent topics within tweets. Although classification is a supervised learning concept, the enrichment step uses unsupervised learning. The proposed algorithm utilizes a transformer-based topic modeling (BERTopic). It employs two types of document representations and performs averaging and concatenation to produce contextual topic embeddings. A comparative analysis of F1-score, precision, recall, and accuracy is presented. The experimental results demonstrate that our enriched representation outperformed the baseline models by different rates. These encouraging results suggest that using latent topic distribution, obtained from the BERTopic technique, can improve the classifier’s ability to distinguish between different CD categories.

Item Type: Article
Uncontrolled Keywords: Arabic tweets, cognitive distortions’ classification, machine learning, social media, supervised learning, unsupervised learning, transformers, BERTopic, topic modeling
Subjects: Q Science > Q Science (General)
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Faculty of Health & Science > Department of Science & Technology
Depositing User: Ali Al-Haj
Date Deposited: 02 Feb 2022 10:51
Last Modified: 02 Feb 2022 12:22
URI: http://oars.uos.ac.uk/id/eprint/2327

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year