
ABSTRACT The rapid growth of digital communication has led to a significant increase in spam emails, which pose serious threats to information security and user productivity. Spam messages often contain phishing links, malware attachments, and fraudulent advertisements. Traditional rule-based spam filtering methods are increasingly ineffective in detecting evolving spam patterns. Natural Language Processing (NLP) combined with machine learning techniques offers a powerful solution for automated spam detection. This study proposes a machine learning-based framework for identifying spam emails using NLP techniques. The system applies text preprocessing methods such as tokenization, stop-word removal, and term frequency–inverse document frequency (TF–IDF) feature extraction to transform email text into numerical features. Machine learning algorithms including Naïve Bayes, Logistic Regression, Support Vector Machine (SVM), and Random Forest are implemented for classification. Model performance is evaluated using Accuracy, Precision, Recall, and F1-score metrics. Experimental results demonstrate that machine learning-based models can effectively identify spam emails and significantly improve email filtering accuracy. Key words: Spam Detection, Natural Language Processing, Machine Learning, Email Classification, Text Mining.
Spam Detection, Natural Language Processing, Machine Learning, Email Classification, Text Mining.
Spam Detection, Natural Language Processing, Machine Learning, Email Classification, Text Mining.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
