Profiling Reader Sentiment and Identifying Key Linguistic Markers in Digital Book Reviews: A TF-IDF and Logistic Regression Approach
- Samah Anwar Abbas Email Samah Anwar Abbas.
Abstract
The proliferation of user-generated content on digital platforms has made sentiment analysis a crucial tool for understanding public opinion. This study focuses on the literary domain, applying natural language processing and machine learning techniques to classify sentiment in book reviews. The primary objectives were to implement a robust pipeline for categorizing reviews as negative, neutral, or positive, and to identify the key linguistic markers most predictive of each sentiment class. To achieve this, a dataset of pre-labeled book reviews was systematically preprocessed through cleaning, tokenization, stop-word removal, and lemmatization. The cleaned text was then converted into a numerical feature matrix using the Term Frequency-Inverse Document Frequency (TF-IDF) method, configured to capture both unigrams and bigrams. A multi-class (One-vs-Rest) Logistic Regression model was trained on this feature matrix. Upon evaluation with an unseen test set, the model demonstrated flawless performance, achieving 1.00 across all standard metrics, including accuracy, precision, recall, and F1-score. A detailed analysis revealed that this perfect score was a direct result of the dataset's syntactic simplicity and the presence of unambiguous, high-polarity keywords (e.g., "disappointing," "masterpiece"). The study successfully validates the implemented pipeline as a proof-of-concept, demonstrating its effectiveness under ideal conditions. However, it also highlights that the model's generalizability is limited by the dataset's lack of complexity. Future research should focus on applying this methodology to larger, more nuanced real-world datasets to test its robustness and explore more advanced analytical techniques like aspect-based sentiment analysis.
Keywords: Book Reviews, Logistic Regression, Natural Language Processing, Sentiment Analysis, TF-IDF
How to Cite:
Abbas, S., (2025) “Profiling Reader Sentiment and Identifying Key Linguistic Markers in Digital Book Reviews: A TF-IDF and Logistic Regression Approach ”, Journal of Digital Society 1(3), 230-243. doi: https://doi.org/10.63913/jds.v1i3.52
Downloads:
Download PDF
View PDF
98 Views
19 Downloads