publication . Article . 2012

A detailed account of the First Question Generation Shared Task Evaluation challenge

Rus, Vasile; Wyse, Brendan; Piwek, Paul; Lintean, Mihai; Stoyanchev, Svetlana; Moldovan, Cristian;
Open Access English
  • Published: 01 Mar 2012
Abstract
The paper provides a detailed account of the First Shared Task Evaluation Challenge on Question Generation that took place in 2010. The campaign included two tasks that take text as input and produce text, i.e. questions, as output: Task A – Question Generation from Paragraphs and Task B – Question Generation from Sentences. Motivation, data sets, evaluation criteria, guidelines for judges, and results are presented for the two tasks. Lessons learned and advice for future Question Generation Shared Task Evaluation Challenges (QG-STEC) are also offered.
Related Organizations
Funded by
RCUK| CODA: COherent Dialogue Automatically generated from text
Project
  • Funder: Research Council UK (RCUK)
  • Project Code: EP/G020981/1
  • Funding stream: EPSRC
,
NSF| The 2nd Workshop on Question Generation
Project
  • Funder: National Science Foundation (NSF)
  • Project Code: 0938239
  • Funding stream: Directorate for Computer & Information Science & Engineering | Division of Information and Intelligent Systems
,
NSF| Workshop on The Question Generation Shared Task and Evaluation Challenge
Project
  • Funder: National Science Foundation (NSF)
  • Project Code: 0836259
  • Funding stream: Directorate for Computer & Information Science & Engineering | Division of Information and Intelligent Systems
34 references, page 1 of 3

Ali, H, Chali, Y., Hasan, S. (2010). Automation of Question Generation from Sentences, In: Boyer & Piwek (2010), pp. 58-67.

Beck, J.E., Mostow, J., and Bey, J. (2004). Can Automated Questions Scaffold Children's Reading Comprehension? Proceedings of the 7th International Conference on Intelligent Tutoring Systems, 478- 490. 2004. Maceio, Brazil.

Belz, A., Kow, E., Viethen, J. and Gatt, A. (2008) The GREC Challenge 2008: Overview and Evaluation Results. In Proceedings of the 5th International Natural Language Generation Conference (INLG'08), pp. 183-191.

Belz, A. and Kow, E. (2010a). The GREC Challenges 2010: Overview and Evaluation Results. In Proceedings of the 6th International Natural Language Generation Conference (INLG'10), pp. 219- 229. [OpenAIRE]

Belz, A. and Kow, E. (2010b). Comparing Rating Scales and Preference Judgements in Language Evaluation. In Proceedings of the 6th International Natural Language Generation Conference (INLG'10), pp. 7-15.

Boyer, K. and Piwek, P. (2010) (Eds). QG2010: The Third Workshop on Question Generation, Carnegie Mellon University, Pittsburgh, PA, USA.

Dale, R. & M. White (2007) (Eds.). Position Papers of the Workshop on Shared Tasks and Comparative Evaluation in Natural Language Generation.

duVerle, D. and Prendinger, H. (2009). A Novel Discourse Parser Based on Support Vector Machines. Proc 47th Annual Meeting of the Association for Computational Linguistics and the 4th Int'l Joint Conf on Natural Language Processing of the Asian Federation of Natural Language Processing (ACLIJCNLP'09), Singapore, Aug 2009 (ACL and AFNLP), pp 665-673.

Edmonds, P. (2002). Introduction to Senseval. ELRA Newsletter, October 2002.

Judita Preiss and David Yarowsky (2001). Editors. The Proceedings of SENSEVAL-2: Second International Workshop on Evaluating Word Sense Disambiguation Systems.

Koller, A., Striegnitz, K., Gargett, A., Byron, D., Cassell, J., Dale, R., Moore, J., and Oberlander, J. (2010). Report on the Second NLG Challenge on Generating Instructions in Virtual Environments (GIVE-2). In Proceedings of the 6th International Natural Language Generation Conference (INLG), Dublin, Ireland. [OpenAIRE]

Kunichika, H., Katayama, T., Hirashima, T., & Takeuchi, A. (2001). Automated Question Generation Methods for Intelligent English Learning Systems and its Evaluation, Proc. of ICCE01.

Lauer, T., Peacock, E., & Graesser, A. C. (1992) (Eds.). Questions and information systems. Hillsdale, NJ: Erlbaum.

Lin, C.Y. (2004). ROUGE: A Package for Automatic Evaluation of Summaries. In Proceedings of the Workshop on Text Summarization Branches Out, post-conference workshop of ACL 2004, Barcelona, Spain.

Lin, C. and Och, F. (2004). Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics, In Proceedings of the 42nd Annual Meeting of the Association of Computational Linguistics.

34 references, page 1 of 3
Powered by OpenAIRE Research Graph
Any information missing or wrong?Report an Issue