Quantitative Evaluation of Native Microsoft Copilot Studio on the PROTEX Behavioural Homicide Corpus: A 200-Question Benchmark

Barciok, Karol

Found an issue? Give us feedback

ZENODOarrow_drop_down

ZENODO

Report

Data sources: ZENODO

Quantitative Evaluation of Native Microsoft Copilot Studio on the PROTEX Behavioural Homicide Corpus: A 200-Question Benchmark

descriptionPublicationkeyboard_double_arrow_right Report Under curation English Publisher:Zenodo

Authors: Barciok, Karol;

doi: 10.5281/zenodo.20490517

Quantitative Evaluation of Native Microsoft Copilot Studio on the PROTEX Behavioural Homicide Corpus: A 200-Question Benchmark

- Summary

Abstract

This study presents a quantitative evaluation of native Microsoft Copilot Studio operating within the PROTEX behavioural homicide corpus, a structured repository of 285 homicide case files developed for behavioural and criminological research. A benchmark consisting of 200 manually generated questions was constructed to assess factual retrieval, comparative behavioural reasoning, false-premise rejection, uncertainty preservation, and semantic contamination resistance. Responses were evaluated manually against corpus documentation using predefined scoring criteria. Across 200 benchmark questions, Microsoft Copilot Studio achieved an accuracy rate of 97.5%, rising to 98.75% when partially correct responses were weighted proportionally. No confirmed hallucinations were observed. False-premise rejection, uncertainty preservation, and semantic contamination resistance each achieved perfect performance within the evaluated benchmark. The findings are presented as a quantitative extension of an earlier PROTEX migration study examining retrieval stability, epistemic corpus design, and uncertainty preservation in enterprise AI environments. Together, the two studies suggest that corpus design and explicit representation of evidentiary uncertainty may play a significant role in improving retrieval reliability within specialized knowledge systems.

Found an issue? Give us feedback