Actions
  • shareshare
  • link
  • cite
  • add
add
auto_awesome_motion View all 4 versions
Publication . Conference object . Part of book or chapter of book . Preprint . 2021

Prioritization of COVID-19-Related Literature via Unsupervised Keyphrase Extraction and Document Representation Learning

Blaž Škrlj; Marko Jukič; Nika Eržen; Senja Pollak; Nada Lavrač;
Open Access  
Published: 17 Oct 2021
Publisher: Springer International Publishing
Abstract
The COVID-19 pandemic triggered a wave of novel scientific literature that is impossible to inspect and study in a reasonable time frame manually. Current machine learning methods offer to project such body of literature into the vector space, where similar documents are located close to each other, offering an insightful exploration of scientific papers and other knowledge sources associated with COVID-19. However, to start searching, such texts need to be appropriately annotated, which is seldom the case due to the lack of human resources. In our system, the current body of COVID-19-related literature is annotated using unsupervised keyphrase extraction, facilitating the initial queries to the latent space containing the learned document embeddings (lowdimensional representations). The solution is accessible through a web server capable of interactive search, term ranking, and exploration of potentially interesting literature. We demonstrate the usefulness of the approach via case studies from the medicinal chemistry domain.
Subjects by Vocabulary

Microsoft Academic Graph classification: Feature learning Scientific literature Literature-based discovery Space (commercial competition) Domain (software engineering) Ranking (information retrieval) Information retrieval Term (time) Computer science Web server computer.software_genre computer

Subjects

COVID-19, literature-based discovery, representation learning, Computer Science - Information Retrieval, Computer Science - Computation and Language, Computer Science - Digital Libraries

Related Organizations
Funded by
EC| EMBEDDIA
Project
EMBEDDIA
Cross-Lingual Embeddings for Less-Represented Languages in European News Media
  • Funder: European Commission (EC)
  • Project Code: 825153
  • Funding stream: H2020 | RIA
Validated by funder
Related to Research communities
Download fromView all 3 sources
lock_open