
As fake news spreads rapidly in social media, attempts to develop detection technology to automatically identify fake news are actively being developed, recently. However, most of them focus only on the linguistic and compositional characteristics of fake news (e.g., source or authors indication, length of a message, frequency of negative words). Compared to them, this study proposes a fake news detection model based on machine learning that reflects the characteristics of users, news content, and social networks based on social capital. To comprehensively reflect the characteristics related to the spread of fake news, this study applied the XGBoost model to estimate the feature importance of each variable to derive the priority factors that preferentially affect fake news detection. Based on the derived variables, we established SVM, RF, LR, CART, and NNET, which are representative classification models of machine learning, and compared the performance rate of fake news detection. To generalize the established models (i.e., to avoid overfitting or underfitting), this study performed a cross-validation step, and to compare the predictive accuracy of the established models. As a result, the RF model indicated the highest prediction rate at about 94%, while the NNET had the lowest performance rate at about 92.1%. The results of this study are expected to contribute to improve the fake news detection system in preparation for the more sophisticated generation and spread of fake news.
fake news, Classification algorithms, feature selection, Electrical engineering. Electronics. Nuclear engineering, predictive models, fake news detection, prediction algorithms, TK1-9971
fake news, Classification algorithms, feature selection, Electrical engineering. Electronics. Nuclear engineering, predictive models, fake news detection, prediction algorithms, TK1-9971
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 27 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
