Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2022
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2022
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

The English Headline Treebank corpus

Authors: Benton, Adrian; Shi, Tianze; Irsoy, Ozan; Malioutov, Igor;

The English Headline Treebank corpus

Abstract

This repository contains the evaluation sets used in A Benton, T Shi, O İrsoy, and I Malioutov."Weakly Supervised Headline Dependency Parsing". Findings of EMNLP. 2022. This dataset contains parse annotations for English news headlines and a script to produce conllu files joined with original headline text. Parse annotations are joined to the corresponding text by running: LDC_NYT_DIR="/PATH/TO/UNTARRED/LDC2008T19/" # path to untarred LDC2008T19 python build_eht.py --nyt_dir ${LDC_NYT_DIR} --num_proc 4 This will download the Google sentence compression (GSC) dataset, and build conllu files for GSC examples. If you have the New York Times Annotated Corpus (LDC2008T19) untarred locally, this will also join annotations to the NYT examples (location passed via --nyt_dir). Increase the argument to --num_procs to process more shards from the NYT corpus in parallel and reduce build time. The above was tested with python 3.9.7. The EHT evaluation sets, with gold-annotated POS tags and dependency relations, are built as EHT/gsc.test.conllu and EHT/nyt.test.conllu Silver, projected, trees which we used to train and validate out models are built under GSC_projected. These are not gold parse trees (projected predictions from the article lead sentence), and are shared purely for reproducibility sake.

{"references": ["Weakly Supervised Headline Dependency Parsing. Adrian Benton, Tianze Shi,Ozan Irsoy,Igor Malioutov. Findings of EMNLP 2022."]}

Related Organizations
Keywords

NPL, Headline, Dependency, Parse, Syntax, UD, Natural Language Processing

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 10
  • 10
    views
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
0
Average
Average
Average
10
Related to Research communities