Improving Arabic cognitive distortion classification in Twitter using BERTopic
Alhaj, Fatima, Al-Haj, Ali, Sharieh, Ahmad and Jabri, Riad (2022) Improving Arabic cognitive distortion classification in Twitter using BERTopic. International Journal of Advanced Computer Science and Applications, 13 (1). pp. 854-860. ISSN 2158-107X
Paper_99-Improving_Arabic_Cognitive_Distortion_Classification_in_Twitter.pdf - Published Version
Available under License Creative Commons Attribution.
Download (377kB) | Preview
Abstract
Social media platforms allow users to share thoughts, experiences, and beliefs. These platforms represent a rich resource for natural language processing techniques to make inferences in the context of cognitive psychology. Some inaccurate and biased thinking patterns are defined as cognitive distortions. Detecting these distortions helps users restructure how to perceive thoughts in a healthier way. This paper proposed a machine learning-based approach to improve cognitive distortions’ classi-fication of the Arabic content over Twitter. One of the challenges that face this task is the text shortness, which results in a sparsity of co-occurrence patterns and a lack of context information (semantic features). The proposed approach enriches text rep-resentation by defining the latent topics within tweets. Although classification is a supervised learning concept, the enrichment step uses unsupervised learning. The proposed algorithm utilizes a transformer-based topic modeling (BERTopic). It employs two types of document representations and performs averaging and concatenation to produce contextual topic embeddings. A comparative analysis of F1-score, precision, recall, and accuracy is presented. The experimental results demonstrate that our enriched representation outperformed the baseline models by different rates. These encouraging results suggest that using latent topic distribution, obtained from the BERTopic technique, can improve the classifier’s ability to distinguish between different CD categories.
Item Type: | Article |
---|---|
Uncontrolled Keywords: | Arabic tweets, cognitive distortions’ classification, machine learning, social media, supervised learning, unsupervised learning, transformers, BERTopic, topic modeling |
Subjects: | Q Science > Q Science (General) Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
Divisions: | Faculty of Health & Science > Department of Science & Technology |
Depositing User: | Ali Alhaj |
Date Deposited: | 02 Feb 2022 10:51 |
Last Modified: | 02 Feb 2022 12:22 |
URI: | https://oars.uos.ac.uk/id/eprint/2327 |