TY - JOUR
T1 - A comparison of classification models to detect cyberbullying in the Peruvian Spanish language on twitter
AU - Cuzcano, Ximena M.
AU - Ayma, Victor H.
N1 - Publisher Copyright:
© 2020 Science and Information Organization. All rights reserved.
PY - 2020/10/1
Y1 - 2020/10/1
N2 - —Cyberbullying is a social problem in which bullies’ actions are more harmful than in traditional forms of bullying as they have the power to repeatedly humiliate the victim in front of an entire community through social media. Nowadays, multiple works aim at detecting acts of cyberbullying via the analysis of texts in social media publications written in one or more languages; however, few investigations target the cyberbullying detection in the Spanish language. In this work, we aim to compare four traditional supervised machine learning methods performances in detecting cyberbullying via the identification of four cyberbullying-related categories on Twitter posts written in the Peruvian Spanish language. Specifically, we trained and tested the Naive Bayes, Multinomial Logistic Regression, Support Vector Machines, and Random Forest classifiers upon a manually annotated dataset with the help of human participants. The results indicate that the best performing classifier for the cyberbullying detection task was the Support Vector Machine classifier.
AB - —Cyberbullying is a social problem in which bullies’ actions are more harmful than in traditional forms of bullying as they have the power to repeatedly humiliate the victim in front of an entire community through social media. Nowadays, multiple works aim at detecting acts of cyberbullying via the analysis of texts in social media publications written in one or more languages; however, few investigations target the cyberbullying detection in the Spanish language. In this work, we aim to compare four traditional supervised machine learning methods performances in detecting cyberbullying via the identification of four cyberbullying-related categories on Twitter posts written in the Peruvian Spanish language. Specifically, we trained and tested the Naive Bayes, Multinomial Logistic Regression, Support Vector Machines, and Random Forest classifiers upon a manually annotated dataset with the help of human participants. The results indicate that the best performing classifier for the cyberbullying detection task was the Support Vector Machine classifier.
KW - Feature extraction
KW - Machine learning
KW - Natural language processing
KW - —Cyberbullying detection
UR - http://www.scopus.com/inward/record.url?scp=85101642625&partnerID=8YFLogxK
UR - https://www.mendeley.com/catalogue/d031ef0a-1fb2-3db2-83d8-508db9b6154b/
U2 - 10.14569/IJACSA.2020.0111018
DO - 10.14569/IJACSA.2020.0111018
M3 - Article in a journal
AN - SCOPUS:85101642625
SN - 2158-107X
VL - 11
SP - 132
EP - 138
JO - International Journal of Advanced Computer Science and Applications
JF - International Journal of Advanced Computer Science and Applications
IS - 10
ER -