
This case study explores the potential of repurposing existing annotation guidelines to instruct a large language model (LLM) annotator in text annotation tasks. Traditional annotation projects invest significant resources–both time and cost–in developing comprehensive annotation guidelines. These are primarily designed for human annotators who will undergo training sessions to check and correct their understanding of the guidelines. While the results of the training are internalized in the human annotators, LLMs require the training content to be materialized. Thus, we introduce a method called moderation-oriented guideline repurposing, which adapts annotation guidelines to provide clear and explicit instructions through a process called LLM moderation. Using the NCBI Disease Corpus and its detailed guidelines, our experimental results demonstrate that, despite several remaining challenges, repurposing the guidelines can effectively guide LLM annotators. Our findings highlight both the promising potential and the limitations of leveraging the proposed workflow in automated settings, offering a new direction for a scalable and cost-effective refinement of annotation guidelines and the following annotation process.
[INFO.INFO-TT] Computer Science [cs]/Document and Text Processing
[INFO.INFO-TT] Computer Science [cs]/Document and Text Processing
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
