Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Report
Data sources: ZENODO
addClaim

Quantitative Evaluation of Native Microsoft Copilot Studio on the PROTEX Behavioural Homicide Corpus: A 200-Question Benchmark

Authors: Barciok, Karol;

Quantitative Evaluation of Native Microsoft Copilot Studio on the PROTEX Behavioural Homicide Corpus: A 200-Question Benchmark

Abstract

This study presents a quantitative evaluation of native Microsoft Copilot Studio operating within the PROTEX behavioural homicide corpus, a structured repository of 285 homicide case files developed for behavioural and criminological research. A benchmark consisting of 200 manually generated questions was constructed to assess factual retrieval, comparative behavioural reasoning, false-premise rejection, uncertainty preservation, and semantic contamination resistance. Responses were evaluated manually against corpus documentation using predefined scoring criteria. Across 200 benchmark questions, Microsoft Copilot Studio achieved an accuracy rate of 97.5%, rising to 98.75% when partially correct responses were weighted proportionally. No confirmed hallucinations were observed. False-premise rejection, uncertainty preservation, and semantic contamination resistance each achieved perfect performance within the evaluated benchmark. The findings are presented as a quantitative extension of an earlier PROTEX migration study examining retrieval stability, epistemic corpus design, and uncertainty preservation in enterprise AI environments. Together, the two studies suggest that corpus design and explicit representation of evidentiary uncertainty may play a significant role in improving retrieval reliability within specialized knowledge systems.

Powered by OpenAIRE graph
Found an issue? Give us feedback