
arXiv: 1208.1087
Inter-coder agreement measures, like Cohen's kappa, correct the relative frequency of agreement between coders to account for agreement which simply occurs by chance. However, in some situations these measures exhibit behavior which make their values difficult to interprete. These properties, e.g. the "annotator bias" or the "problem of prevalence", refer to a tendency of some of these measures to indicate counterintuitive high or low values of reliability depending on conditions which many researchers consider as unrelated to inter-coder reliability. However, not all researchers agree with this view, and since there is no commonly accepted formal definition of inter-coder reliability, it is hard to decide whether this depends upon a different concept of reliability or simply upon flaws in the measuring algorithms. In this note we therefore take an axiomatic approach: we introduce a model for the rating of items by several coders according to a nominal scale. Based upon this model we define inter-coder reliability as a probability to assign a category to an item with certainty. We then discuss under which conditions this notion of inter-coder reliability is uniquely determined given typical experimental results, i.e. relative frequencies of category assignments by different coders. In addition we provide an algorithm and conduct numerical simulations which exhibit the accuracy of this algorithm under different model parameter settings.
21 pages, 6 figures
Methodology (stat.ME), FOS: Computer and information sciences, Applications (stat.AP), 62H20 (Primary) 62F10 (Secondary), Statistics - Applications, Statistics - Methodology
Methodology (stat.ME), FOS: Computer and information sciences, Applications (stat.AP), 62H20 (Primary) 62F10 (Secondary), Statistics - Applications, Statistics - Methodology
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
