search
Include:
1 Research products, page 1 of 1

Relevance
arrow_drop_down
  • Publication . Conference object . Part of book or chapter of book . 2014
    Open Access English
    Authors: 
    Graus, D.; Tsagkias, M.; Buitinck, L.; de Rijke, M.; de Rijke, M.; Kenter, T.; de Vries, A.P.; Zhai, C.X.; de Jong, F.; Radinsky, K.; +1 more
    Country: Netherlands
    Project: NWO | Modeling and Learning fro... (2300171779), NWO | SPuDisc: Searching Public... (2300176811), EC | LIMOSINE (288024), NWO | Building Rich Links to En... (2300153702)

    The manual curation of knowledge bases is a bottleneck in fast paced domains where new concepts constantly emerge. Identification of nascent concepts is important for improving early entity linking, content interpretation, and recommendation of new content in real-time applications. We present an unsupervised method for generating pseudo-ground truth for training a named entity recognizer to specifically identify entities that will become concepts in a knowledge base in the setting of social streams. We show that our method is able to deal with missing labels, justifying the use of pseudo-ground truth generation in this task. Finally, we show how our method significantly outperforms a lexical-matching baseline, by leveraging strategies for sampling pseudo-ground truth based on entity confidence scores and textual quality of input documents.

Include:
1 Research products, page 1 of 1
  • Publication . Conference object . Part of book or chapter of book . 2014
    Open Access English
    Authors: 
    Graus, D.; Tsagkias, M.; Buitinck, L.; de Rijke, M.; de Rijke, M.; Kenter, T.; de Vries, A.P.; Zhai, C.X.; de Jong, F.; Radinsky, K.; +1 more
    Country: Netherlands
    Project: NWO | Modeling and Learning fro... (2300171779), NWO | SPuDisc: Searching Public... (2300176811), EC | LIMOSINE (288024), NWO | Building Rich Links to En... (2300153702)

    The manual curation of knowledge bases is a bottleneck in fast paced domains where new concepts constantly emerge. Identification of nascent concepts is important for improving early entity linking, content interpretation, and recommendation of new content in real-time applications. We present an unsupervised method for generating pseudo-ground truth for training a named entity recognizer to specifically identify entities that will become concepts in a knowledge base in the setting of social streams. We show that our method is able to deal with missing labels, justifying the use of pseudo-ground truth generation in this task. Finally, we show how our method significantly outperforms a lexical-matching baseline, by leveraging strategies for sampling pseudo-ground truth based on entity confidence scores and textual quality of input documents.

Send a message
How can we help?
We usually respond in a few hours.