Sarcasm Detection on Reddit Using Classical Machine Learning and Feature Engineering

Subrata Karmaker (1)
(1) Department of Mathematics, Technische Universität Chemnitz, Germany
Fulltext View | Download
How to cite (IJASEIT) :
[1]
S. Karmaker, “Sarcasm Detection on Reddit Using Classical Machine Learning and Feature Engineering”, Int. J. Data. Science., vol. 6, no. 2, pp. 85–93, Dec. 2025.

Sarcasm is common on social media, yet difficult for machines to interpret. Its meaning often relies on conversational tone, speaker intent or situational contrast—signals not directly visible in plain text. This study investigates how far one can go in sarcasm detection using only classical machine learning techniques and hand-crafted feature engineering, without relying on neural architecture or contextual information. Using a 100,000-comment stratified subsample of the Self-Annotated Reddit Corpus (SARC 2.0), I combine word-level and character-level TF–IDF representations with simple stylistic features such as length, punctuation use, and uppercase ratios. Four classical classifiers are evaluated: logistic regression, linear support vector machines, multinomial Naive Bayes, and random forests. Despite the context-free design, logistic regression and Naive Bayes reach F1-scores of approximately 0.57 on sarcastic comments, demonstrating that classical approaches capture part of the underlying signal. The full code is included for reproducibility.

A. Joshi, P. Bhattacharyya, and M. J. Carman, "Automatic sarcasm detection: A survey," ACM Comput. Surv., vol. 50, no. 5, pp. 1–22, Sep. 2017, doi: 10.1145/3124420.

D. Ghosh, A. Guo, and S. Muresan, "Analyzing sarcasm in conversation context," Comput. Linguistics, vol. 43, no. 4, pp. 761–794, Dec. 2017, doi: 10.1162/coli_a_00336.

S. Farabi, T. Ranasinghe, D. Kanojia, Y. Kong, and M. Zampieri, "A survey of multimodal sarcasm detection," in Proc. 33rd Int. Joint Conf. Artif. Intell. (IJCAI), Macao, China, Aug. 2024, pp. 8020–8028, doi:10.24963/ijcai.2024/887.

M. Khodak, N. Saunshi, and K. Vodrahalli, "A large self-annotated corpus for sarcasm," arXiv:1704.05579, 2018. [Online]. Available: https://arxiv.org/abs/1704.05579.

Creative Commons License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Attribution-ShareAlike 4.0 International License
https://creativecommons.org/licenses/by-sa/4.0/