<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>
Education is increasingly taking place in learning environments mediated by technology. This transition has made it easier to collect student-generated data including comments in discussion forums and chats. Although this data is extremely valuable to researchers, it often contains sensitive information like names, locations, social media links, and other personally identifying information (PII) that must be carefully redacted before utilizing the data for research to protect their privacy. Historically, this task of redacting PII has been painstakingly conducted by humans; more recently, some researchers have attempted to use regular expressions and supervised machine-learning methods. Nowadays, with the recent high performance shown by Large Language Models in a wide range of tasks, they have become another alternative to be explored for de-identifying educational data. In this work, we assess GPT-4's performance in de-identifying data from discussion forums in 9 Massive Open Online Courses. Our results show an average recall of 0.958 for identifying PII that needs to be redacted, suggesting that it is an appropriate tool for this purpose. Our tool is also successful at identifying cases missed by humans when redacting data. These findings indicate that GPT-4 can not only increase the efficiency but also enhance the quality of the redaction process. However, the precision of such redaction is considerably worse (0.526), over-redacting names and locations that do not represent PII, showing a need for further improvement.
citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 1 | |
popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |