
Due to the daily necessity of using links and websites and the high prevalence of malicious URLs, many security threats arise for Internet users and organizations. These threats can lead to data breaches and identity theft, and they can cause a complete system collapse. Traditional methods of detecting malicious URLs are often insufficient and require advanced technologies. This study presents an improvement in the accuracy and speed of detecting malicious URLs through ensemble learning techniques, specifically Bagging (Bootstrap) and Stacking. Extensive experiments on a large, balanced dataset containing 491,530 URLs, equally distributed between benign and malicious, showed that ensemble learning models significantly outperform other algorithms. The Bagging classifier, which uses decision trees as the base classifier, achieved an accuracy of 99.01%, a training time of 23.84 seconds, and a prediction time of 0.86 seconds. The Stacking classifier, which uses AdaBoost, Random Forest, and XGBoost as base classifiers, also achieved similar results, although the training time increased to 199.6944 seconds due to the complexity of this model. In addition to the results, we obtained, which demonstrated the superiority of bagging and stacking models, we conducted a comprehensive comparison with other popular models, ranging from individual machine learning models such as k-Nearest Neighbors, to deep learning models such as feedforward neural networks, to ensemble learning models with various techniques such as boosting. These results highlight the promising potential of ensemble learning in strengthening cybersecurity measures and protecting users and businesses from malicious URL attacks.
cybersecurity, classification algorithms, Science, benign urls, Q, uniform resource locator (urls), deep learning, ensemble learning, supervised machine learning
cybersecurity, classification algorithms, Science, benign urls, Q, uniform resource locator (urls), deep learning, ensemble learning, supervised machine learning
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
