Skip to main content
Article

Digital Vernaculars in Play: A Machine Learning-Based Linguistic Analysis of Slang and Code-Switching in Indonesian Youth Online Gaming Chats 

Authors

Abstract

Online multiplayer games have emerged as critical "third places" for youth socialization, fostering unique digital vernaculars. In Indonesia, this linguistic landscape is a complex blend of formal Indonesian, regional dialects, internet slang, and English code-switching. However, the prevalence of toxic communication—including violence, racism, and harassment—presents a significant barrier to inclusive interaction. Standard automated moderation systems often fail in this context due to a lack of cultural and linguistic nuance. This study addresses this gap by conducting a machine learning-based linguistic analysis to systematically identify and categorize the features of toxic and non-toxic communication in Indonesian gaming chats. Using a manually labeled corpus of 10,702 chat messages, we implemented a supervised classification pipeline. A linear Support Vector Machine (SVM) model, utilizing Term Frequency-Inverse Document Frequency (TF-IDF) and n-gram (unigram and bigram) features, was trained to classify messages into four categories: violence, racism, harassment, and neutral. The model achieved a robust overall accuracy of 82%, demonstrating high efficacy in differentiating between the categories. The central finding of this research is the empirical validation that different forms of toxicity possess distinct and computationally identifiable vernaculars. The language of violence is characterized by general, impersonal insults; racism by specific, identity-based slurs; and harassment by targeted, often sexualized and gendered, terminology. This data-driven comparative analysis provides a nuanced linguistic framework that moves beyond simple keyword flagging. The findings have direct implications for the design of more sophisticated, culturally-aware automated moderation systems capable of understanding the specific nature of toxic behavior in complex digital environments.

Keywords: Code-Switching, Computational Linguistics, Hate Speech Detection, Indonesian Language, Online Gaming

How to Cite:

Dzulqarnain, M. & Mirela, N., (2025) “Digital Vernaculars in Play: A Machine Learning-Based Linguistic Analysis of Slang and Code-Switching in Indonesian Youth Online Gaming Chats ”, Journal of Digital Society 1(3), 201-215. doi: https://doi.org/10.63913/jds.v1i3.49

Downloads:
Download PDF
View PDF

107 Views

19 Downloads

Published on
2025-09-24