Downloads provided by UsageCounts
arXiv: 2405.18335
handle: 11093/6643 , 10400.22/25145 , 11328/5292
Wiki articles are created and maintained by a crowd of editors, producing a continuous stream of reviews. Reviews can take the form of additions, reverts, or both. This crowdsourcing model is exposed to manipulation since neither reviews nor editors are automatically screened and purged. To protect articles against vandalism or damage, the stream of reviews can be mined to classify reviews and profile editors in real-time. The goal of this work is to anticipate and explain which reviews to revert. This way, editors are informed why their edits will be reverted. The proposed method employs stream-based processing, updating the profiling and classification models on each incoming event. The profiling uses side and content-based features employing Natural Language Processing, and editor profiles are incrementally updated based on their reviews. Since the proposed method relies on self-explainable classification algorithms, it is possible to understand why a review has been classified as a revert or a non-revert. In addition, this work contributes an algorithm for generating synthetic data for class balancing, making the final classification fairer. The proposed online method was tested with a real data set from Wikivoyage, which was balanced through the aforementioned synthetic data generation. The results attained near-90 % values for all evaluation metrics (accuracy, precision, recall, and F-measure).
FOS: Computer and information sciences, Computer Science - Machine Learning, Synthetic data, Computer Science - Artificial Intelligence, Data-stream processing and classification, Transparency, Machine Learning (cs.LG), data-stream processing and classification, Wikis, 1203.17 Informática, transparency, vandalism, Computer Science - Computation and Language, synthetic data, TK1-9971, Vandalism, Data reliability and fairness, Artificial Intelligence (cs.AI), wikis, 3325 Tecnología de las Telecomunicaciones, Electrical engineering. Electronics. Nuclear engineering, Computation and Language (cs.CL), 6308 Comunicaciones Sociales
FOS: Computer and information sciences, Computer Science - Machine Learning, Synthetic data, Computer Science - Artificial Intelligence, Data-stream processing and classification, Transparency, Machine Learning (cs.LG), data-stream processing and classification, Wikis, 1203.17 Informática, transparency, vandalism, Computer Science - Computation and Language, synthetic data, TK1-9971, Vandalism, Data reliability and fairness, Artificial Intelligence (cs.AI), wikis, 3325 Tecnología de las Telecomunicaciones, Electrical engineering. Electronics. Nuclear engineering, Computation and Language (cs.CL), 6308 Comunicaciones Sociales
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
| views | 10 | |
| downloads | 3 |

Views provided by UsageCounts
Downloads provided by UsageCounts