
Annotations and replication materials for 'SoK: Machine Learning for Misinformation Detection' I've included descriptions of file contents below. annotations_aec.tsv: Contains annotations for our full paper corpus, comprising 248 published works. We annotated these papers for target, dataset curation, model choice, feature selection, and evaluation. paper_selection_criteria.txt: Our criteria for assembling the full and focus coding sets, adapted from pages 3, 5 ('Paper selection') and 6. replications.zip: within this zip archive, you'll find three subfolders, each corresponding to one of the three replication analyses found on pages 11-13 of the manuscript. We've included the subsection header in the manuscript where each dataset / codebase is discussed: articles (5.1): includes original and modified Reuters and NYTimes texts and accompanying labels (these are new datasets that we introduced for the sake of robustness testing). Also includes FA-KES and ISOT datasets and classifier (new_RNN_CNN.py) used by the original study authors and their classifier. users (5.2): includes troll and non-troll summary statistics, by account, with accompanying label. Also includes the classifier used by the original study author. sources (5.3): includes splits, classifier, and datasets used by the original author. Notes on open-source availability for each codebase: the source-scoped replication code is freely available online. We received permission from the authors of the article-scoped study to open-source their code. We've previously contacted the author of the user-scoped work (TrollMagnifier) and have not received a response -- we are sharing their code here, for the sake of artifact evaluation; open-source availability is pending an affirmative response from the author.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
