publication . Conference object . 2020

An Evaluation Method for Diachronic Word Sense Induction

Ashjan Alsulaimani; Erwan Moreau; Carl Vogel;
Open Access English
  • Published: 01 Nov 2020
  • Publisher: HAL CCSD
  • Country: France
Abstract
International audience; The task of Diachronic Word Sense Induction (DWSI) aims to identify the meaning of words from their context, taking the temporal dimension into account. In this paper we propose an evaluation method based on largescale time-stamped annotated biomedical data, and a range of evaluation measures suited to the task. The approach is applied to two recent DWSI systems, thus demonstrating its relevance and providing an in-depth analysis of the models.
Persistent Identifiers
Subjects
free text keywords: [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing, Word-sense induction, Artificial intelligence, business.industry, business, Biomedical data, Computer science, Natural language processing, computer.software_genre, computer, Evaluation methods
Related Organizations
Funded by
SFI| ADAPT: Centre for Digital Content Platform Research
Project
  • Funder: Science Foundation Ireland (SFI)
  • Project Code: 13/RC/2106
  • Funding stream: SFI Research Centres
Communities
Digital Humanities and Cultural Heritage
20 references, page 1 of 2

Eneko Agirre and Aitor Soroa. 2007. Semeval-2007 task 02: Evaluating word sense induction and discrimination systems. In Proceedings of the fourth international workshop on semantic evaluations (semeval-2007), pages 7-12.

Donald J Berndt and James Clifford. 1994. Using dynamic time warping to find patterns in time series. In KDD workshop, volume 10, pages 359-370. Seattle, WA.

David M Blei and John D Lafferty. 2006. Dynamic topic models. In Proceedings of the 23rd international conference on Machine learning, pages 113- 120. ACM.

Olivier Bodenreider. 2004. The unified medical language system (UMLS): integrating biomedical terminology. Nucleic acids research, 32(suppl 1):D267-D270.

Paul Cook, Jey Han Lau, Diana McCarthy, and Timothy Baldwin. 2014. Novel word-sense identification. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pages 1624-1635.

Martin Emms and Arun Kumar Jayapal. 2016. Dynamic generative model for diachronic sense emergence detection. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 1362-1373.

Lea Frermann. 2017. Bayesian Models of Category Acquistion and Meaning Development. Phd thesis, University of Edinburgh. [OpenAIRE]

Lea Frermann and Mirella Lapata. 2016. A bayesian model of diachronic meaning change. Transactions of the Association for Computational Linguistics, 4:31-45.

Kristina Gulordava and Marco Baroni. 2011. A distributional similarity approach to the detection of semantic change in the google books ngram corpus. In Proceedings of the GEMS 2011 workshop on geometrical models of natural language semantics, pages 67-71.

Arun Jayapal. 2017. Finding Sense Changes by Unsupervised Methods. Phd thesis, Trinity College Dublin.

Antonio J Jimeno-Yepes, Bridget T McInnes, and Alan R Aronson. 2011. Exploiting mesh indexing in medline to generate a data set for word sense disambiguation. BMC bioinformatics, 12(1):223.

Andrey Kutuzov, Lilja Øvrelid, Terrence Szymanski, and Erik Velldal. 2018. Diachronic word embeddings and semantic shifts: a survey. arXiv preprint arXiv:1806.03537. [OpenAIRE]

Jey Han Lau, Paul Cook, Diana McCarthy, David Newman, and Timothy Baldwin. 2012. Word sense induction for novel sense detection. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pages 591-601. Association for Computational Linguistics.

Suresh Manandhar, Ioannis Klapaftis, Dmitriy Dligach, and Sameer Pradhan. 2010. SemEval-2010 task 14: Word sense induction &disambiguation. In Proceedings of the 5th International Workshop on Semantic Evaluation, pages 63-68, Uppsala, Sweden. Association for Computational Linguistics.

Jean-Baptiste Michel, Yuan Kui Shen, Aviva Presser Aiden, Adrian Veres, Matthew K Gray, Joseph P Pickett, Dale Hoiberg, Dan Clancy, Peter Norvig, Jon Orwant, et al. 2011. Quantitative analysis of culture using millions of digitized books. science, 331(6014):176-182.

20 references, page 1 of 2
Abstract
International audience; The task of Diachronic Word Sense Induction (DWSI) aims to identify the meaning of words from their context, taking the temporal dimension into account. In this paper we propose an evaluation method based on largescale time-stamped annotated biomedical data, and a range of evaluation measures suited to the task. The approach is applied to two recent DWSI systems, thus demonstrating its relevance and providing an in-depth analysis of the models.
Persistent Identifiers
Subjects
free text keywords: [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing, Word-sense induction, Artificial intelligence, business.industry, business, Biomedical data, Computer science, Natural language processing, computer.software_genre, computer, Evaluation methods
Related Organizations
Funded by
SFI| ADAPT: Centre for Digital Content Platform Research
Project
  • Funder: Science Foundation Ireland (SFI)
  • Project Code: 13/RC/2106
  • Funding stream: SFI Research Centres
Communities
Digital Humanities and Cultural Heritage
20 references, page 1 of 2

Eneko Agirre and Aitor Soroa. 2007. Semeval-2007 task 02: Evaluating word sense induction and discrimination systems. In Proceedings of the fourth international workshop on semantic evaluations (semeval-2007), pages 7-12.

Donald J Berndt and James Clifford. 1994. Using dynamic time warping to find patterns in time series. In KDD workshop, volume 10, pages 359-370. Seattle, WA.

David M Blei and John D Lafferty. 2006. Dynamic topic models. In Proceedings of the 23rd international conference on Machine learning, pages 113- 120. ACM.

Olivier Bodenreider. 2004. The unified medical language system (UMLS): integrating biomedical terminology. Nucleic acids research, 32(suppl 1):D267-D270.

Paul Cook, Jey Han Lau, Diana McCarthy, and Timothy Baldwin. 2014. Novel word-sense identification. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pages 1624-1635.

Martin Emms and Arun Kumar Jayapal. 2016. Dynamic generative model for diachronic sense emergence detection. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 1362-1373.

Lea Frermann. 2017. Bayesian Models of Category Acquistion and Meaning Development. Phd thesis, University of Edinburgh. [OpenAIRE]

Lea Frermann and Mirella Lapata. 2016. A bayesian model of diachronic meaning change. Transactions of the Association for Computational Linguistics, 4:31-45.

Kristina Gulordava and Marco Baroni. 2011. A distributional similarity approach to the detection of semantic change in the google books ngram corpus. In Proceedings of the GEMS 2011 workshop on geometrical models of natural language semantics, pages 67-71.

Arun Jayapal. 2017. Finding Sense Changes by Unsupervised Methods. Phd thesis, Trinity College Dublin.

Antonio J Jimeno-Yepes, Bridget T McInnes, and Alan R Aronson. 2011. Exploiting mesh indexing in medline to generate a data set for word sense disambiguation. BMC bioinformatics, 12(1):223.

Andrey Kutuzov, Lilja Øvrelid, Terrence Szymanski, and Erik Velldal. 2018. Diachronic word embeddings and semantic shifts: a survey. arXiv preprint arXiv:1806.03537. [OpenAIRE]

Jey Han Lau, Paul Cook, Diana McCarthy, David Newman, and Timothy Baldwin. 2012. Word sense induction for novel sense detection. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pages 591-601. Association for Computational Linguistics.

Suresh Manandhar, Ioannis Klapaftis, Dmitriy Dligach, and Sameer Pradhan. 2010. SemEval-2010 task 14: Word sense induction &disambiguation. In Proceedings of the 5th International Workshop on Semantic Evaluation, pages 63-68, Uppsala, Sweden. Association for Computational Linguistics.

Jean-Baptiste Michel, Yuan Kui Shen, Aviva Presser Aiden, Adrian Veres, Matthew K Gray, Joseph P Pickett, Dale Hoiberg, Dan Clancy, Peter Norvig, Jon Orwant, et al. 2011. Quantitative analysis of culture using millions of digitized books. science, 331(6014):176-182.

20 references, page 1 of 2
Any information missing or wrong?Report an Issue