publication . Conference object . Preprint . 2018

Effective Unsupervised Author Disambiguation with Relative Frequencies

Tobias Backes;
Open Access English
  • Published: 10 Aug 2018
Abstract
This work addresses the problem of author name homonymy in the Web of Science. Aiming for an efficient, simple and straightforward solution, we introduce a novel probabilistic similarity measure for author name disambiguation based on feature overlap. Using the researcher-ID available for a subset of the Web of Science, we evaluate the application of this measure in the context of agglomeratively clustering author mentions. We focus on a concise evaluation that shows clearly for which problem setups and at which time during the clustering process our approach works best. In contrast to most other works in this field, we are sceptical towards the performance of a...
Subjects
free text keywords: Author Disambiguation, Probabilities, Agglomerative Clustering, Computer Science - Information Retrieval, Computer Science - Computation and Language, Computer Science - Machine Learning, Statistics - Machine Learning, Author name, Convergence (routing), Homonym, Information retrieval, Discriminative model, Probabilistic logic, Computer science, Similarity measure, Cluster analysis, Hierarchical clustering
Funded by
EC| MOVING
Project
MOVING
Training towards a society of data-savvy information professionals to enable open leadership innovation
  • Funder: European Commission (EC)
  • Project Code: 693092
  • Funding stream: H2020 | RIA
16 references, page 1 of 2

[1] Aron Culotta, Pallika Kanani, Robert Hall, Michael Wick, and Andrew McCallum. 2007. Author disambiguation using error-driven machine learning with a ranking loss function. In Sixth International Workshop on Information Integration on the Web (IIWeb-07), Vancouver, Canada.

[2] Anderson A Ferreira, Marcos André Gonçalves, and Alberto HF Laender. 2012. A brief survey of automatic methods for author name disambiguation. Acm Sigmod Record 41, 2 (2012), 15-26.

[3] Anderson A Ferreira, Adriano Veloso, Marcos André Gonçalves, and Alberto HF Laender. 2010. Efective self-training author name disambiguation in scholarly digital libraries. In Proceedings of the 10th annual joint conference on Digital libraries. ACM, 39-48.

[4] Thomas Gurney, Edwin Horlings, and Peter Van Den Besselaar. 2012. Author disambiguation using multi-aspect similarity indicators. Scientometrics 91, 2 (2012), 435-449. [OpenAIRE]

[5] Hui Han, Wei Xu, Hongyuan Zha, and C Lee Giles. 2005. A hierarchical naive Bayes mixture model for name disambiguation in author citations. In Proceedings of the 2005 ACM symposium on Applied computing. ACM, 1065-1069. [OpenAIRE]

[6] Anne-Wil Harzing. 2015. Health warning: might contain multiple personalities - the problem of homonyms in Thomson Reuters Essential Science Indicators. Scientometrics 105, 3 (2015), 2259-2270.

[7] T. Kramer, F. Momeni, and P. Mayr. 2017. Coverage of Author Identifiers in Web of Science and Scopus. ArXiv e-prints (March 2017). arXiv:cs.DL/1703.01319

[8] Michael Levin, Stefan Krawczyk, Steven Bethard, and Dan Jurafsky. 2012. Citationbased bootstrapping for large-scale author disambiguation. Journal of the American Society for Information Science and Technology 63, 5 (2012), 1030-1047.

[9] Staša Milojević. 2013. Accuracy of simple, initials-based methods for author name disambiguation. Journal of Informetrics 7, 4 (2013), 767-773.

[10] Alan Filipe Santana, Marcos André Gonçalves, Alberto HF Laender, and Anderson A Ferreira. 2017. Incremental author name disambiguation by exploiting domain-specific heuristics. Journal of the Association for Information Science and Technology 68, 4 (2017), 931-945.

[11] Neil R Smalheiser and Vetle I Torvik. 2009. Author name disambiguation. Annual review of information science and technology 43, 1 (2009), 1-43.

[12] Yang Song, Jian Huang, Isaac G Councill, Jia Li, and C Lee Giles. 2007. Eficient topic-based unsupervised name disambiguation. In Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries. ACM, 342-351.

[13] Andreas Strotmann and Dangzhi Zhao. 2012. Author name disambiguation: What diference does it make in author-based citation analysis? Journal of the American Society for Information Science and Technology 63, 9 (2012), 1820-1833.

[14] Jie Tang, Alvis CM Fong, Bo Wang, and Jing Zhang. 2012. A unified probabilistic framework for name disambiguation in digital library. IEEE Transactions on Knowledge and Data Engineering 24, 6 (2012), 975-987.

[15] Vetle I Torvik and Neil R Smalheiser. 2009. Author name disambiguation in MEDLINE. ACM Transactions on Knowledge Discovery from Data (TKDD) 3, 3 (2009), 11.

16 references, page 1 of 2
Powered by OpenAIRE Open Research Graph
Any information missing or wrong?Report an Issue
publication . Conference object . Preprint . 2018

Effective Unsupervised Author Disambiguation with Relative Frequencies

Tobias Backes;