Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Other literature type . Conference object . 2018 . Peer-reviewed
License: CC BY NC ND
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
https://easychair.org/publicat...
Conference object
Data sources: UnpayWall
versions View all 5 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

A Data-Driven Metric of Hardness for WSC Sentences

Authors: Nicos Isaak; Loizos Michael;

A Data-Driven Metric of Hardness for WSC Sentences

Abstract

The Winograd Schema Challenge (WSC) — the task of resolving pronouns in certain sentences where shallow parsing techniques seem not to be directly applicable — has been proposed as an alternative to the Turing Test. According to Levesque, having access to a large corpus of text would likely not help much in the WSC. Among a number of attempts to tackle this challenge, one particular approach has demonstrated the plausibility of using commonsense knowledge automatically acquired from raw text in English Wikipedia.Here, we present the results of a large-scale experiment that shows how the performance of that particular automated approach varies with the availability of training material. We compare the results of this experiment with two studies: one from the literature that investigates how adult native speakers tackle the WSC, and one that we design and undertake to investigate how teenager non-native speakers tackle the WSC. We find that the performance of the automated approach correlates positively with the performance of humans, suggesting that the performance of the particular automated approach could be used as a metric of hardness for WSC instances.

Related Organizations
Subjects by Vocabulary

Microsoft Academic Graph classification: Theoretical computer science Computer science Metric (mathematics) Winograd Schema Challenge Data-driven

Keywords

Winograd Schema Challenge

18 references, page 1 of 2

[1] Evan Ackerman. Winograd Schema Challenge Results: AI Common Sense Still a Problem, for Now. Spectrum, 2016.

[2] Dan Bailey, Amelia Harrison, Yuliya Lierler, Vladimir Lifschitz, and Julian Michael. The Winograd Schema Challenge and Reasoning about Correlation. In In Working Notes of the Symposium on Logical Formalizations of Commonsense Reasoning, 2015.

[3] David Bender. Establishing a Human Baseline for the Winograd Schema Challenge. In MAICS, pages 39{45, 2015.

[4] Eric Bengtson and Dan Roth. Understanding the Value of Features for Coreference Resolution. In EMNLP, 10 2008.

[5] Tejas Ulhas Budukh. An intelligent co-reference resolver for Winograd schema sentences containing resolved semantic entities, 2013.

[6] Nicos Isaak and Loizos Michael. Tackling the Winograd Schema Challenge Through Machine Logical Inferences. In David Pearce and Helena So a Pinto, editors, STAIRS, volume 284 of Frontiers in Arti cial Intelligence and Applications, pages 75{86. IOS Press, 2016.

[7] Nicos Isaak and Loizos Michael. Using the Winograd Schema Challenge as a CAPTCHA. In Proceedings of the 4th Global Conference on Arti cial Intelligence (GCAI 2018). EasyChair, 2018.

[8] Hector J. Levesque. The Winograd Schema Challenge. In AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning, number SS-11-06. American Association for Arti cial Intelligence, 2011.

[9] Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. The Stanford CoreNLP natural language processing toolkit. In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 55{60, 2014. [OpenAIRE]

[10] Loizos Michael. Reading Between the Lines. In Proceedings of the 21st International Joint Conference on Arti cial Intelligence (IJCAI 2009), pages 1525{1530, July 2009.

  • BIP!
    Impact byBIP!
    citations
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    1
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 74
    download downloads 62
  • citations
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    1
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    Powered byBIP!BIP!
  • 74
    views
    62
    downloads
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
download
citations
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
downloads
OpenAIRE UsageCountsDownloads provided by UsageCounts
1
Average
Average
Average
74
62
bronze