TopicTracker: a Python pipeline to search, download and explore PubMed entries

TopicTracker is a Python pipeline intended to streamline and simplify the retrieval and exploration of large amounts of PubMed entries. The software is divided into three Jupyter notebooks: 1. Search and download; 2. Content analyser; 3. Interactive data exploration. The first notebook allows to build PubMed queries, download entries, parse them and save them to a .csv file. It takes as input a PubMed query, and outputs a dataset (i.e: a folder containing a PubMed export, its metadata saved in the log file, and the Medline file for eventually importing the references you are analysing in Zotero or similar software). The functions for searching, downloading and parsing are written in a different module in order to simplify adaptations for other projects if need be. The output of the first notebook can be explored with the second and third notebooks of this collection. The second notebook allows to analyse the trends of entities over time. It takes as input a dataset (i.e: a folder containing a PubMed export generated with the first notebook of this collection, its metadata, and the Medline file) and it outputs a set of .csv files and .svg plots with the trends of keywords, MeSH terms, authors, journals, lemmas in Title/Abstract, amount of COI statements, lemma trends in COI statements. The .csv files can then be explored further with the third notebook of this collection. The third notebook allows fully interactive exploration of the datasets preprocessed with the second notebook. You can select a dataset to work with, a set of entities to explore, and plot any entity or combination of entities. Dependencies (and versions) are listed in every notebook. A couple of toy datasets are provided. New in v 1.3: - Managed some more exceptions - Updated some libraries - Optimized the creation of the medline file in notebook 1 To do in v1.4: - understand why the PubMed APIs work so strangely with the PDAT tag - manage exceptions (=empty files -> empty dfs) in the tabs of notebook 3

{"references": ["10.1016/j.heliyon.2020.e04426"]}

Related Organizations

University of Zurich
Switzerland

Keywords

natural language processing, scientometrics, topic tracking, information extraction

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	2
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average