Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Software . 2026
License: CC BY
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Software . 2026
License: CC BY
Data sources: ZENODO
versions View all 2 versions
addClaim

ProfOlaf: Semi-Automated Tool for Systematic Literature Reviews

Authors: Afonso, Martim; Saavedra, Nuno; Ferreira, Joao;

ProfOlaf: Semi-Automated Tool for Systematic Literature Reviews

Abstract

Setup with Dockerfile: Build the image docker build -t profolaf . Run the container docker run -it profolaf ProfOlaf Walkthrough This appendix provides a walkthrough of ProfOlaf, demonstrating how the tool supports automated and semi-automated snowballing for literature reviews. The tool is available both as a web application and as a command-line interface. Here, we describe the typical usage of the command-line version, which exposes the full pipeline. Prerequisites and Input Before running ProfOlaf, the user must prepare an initial seed file: a plain-text (.txt) file containing the titles of the seed articles. These articles represent the starting point of the snowballing process. Main Snowballing Pipeline The snowballing workflow consists of the following steps, which are executed sequentially: generate_search_conf.pyGenerates the search configuration used to query the supported scholarly databases. It also includes information such as metadata filtering criteria and important file paths. 0_generate_snowball_start.pyInitializes the snowballing process using the provided seed file and stores the initial set of articles in the database. 1_start_iteration.pyStarts a new snowballing iteration, collecting forward and backward citations from the set of articles of the previous iteration (or the initial set when starting a new search). Currently, only Semantic Scholar provides both backward and forward snowballing, while only citations can be fetched from Google Scholar. 2_remove_duplicates.pyIdentifies and removes duplicate entries across databases. 3_get_bibtex.pyRetrieves BibTeX metadata for the collected articles. Without a web-scraping proxy, Semantic Scholar is recommended as the search method. Too many requests to Google Scholar may result in a block. (Optional) 4_generate_conf_rank.pyFilters articles based on venue ranking, if the user wishes to restrict the corpus to specific publication venues. 5_filter_by_metadata.pyFilters articles according to metadata attributes (e.g., year, venue, online availability, and language). The user can configure which metadata fields are considered. 6_filter_by_title.pyPerforms a title-based screening. The user is interactively prompted to decide whether to keep or discard each article, along with a brief justification. At this stage, users are encouraged to be conservative and only discard articles that are clearly irrelevant. Optionally, an LLM-based screening can be enabled to assist this process. 7_solve_title_disagreements.pyResolves disagreements between multiple raters. The script presents articles for which raters disagreed, along with their reasoning from the previous step, and prompts them to reach a consensus decision. 8_filter_by_content.pyPerforms content-based screening using the full text of the articles, following the same interaction model as the title-based filtering. 9_solve_content_disagreements.pyResolves rater disagreements arising during content-based screening. Similar to the previous step. The resulting set of articles after this step marks the end of an iteration. IterationSteps 3 through 9 are repeated until no new articles are discovered. 10_generate_csv.pyProduces the final CSV file containing the selected articles and their associated metadata. Additional Analysis Scripts In addition to the main snowballing pipeline, ProfOlaf provides auxiliary scripts for post hoc analysis of the final article set. For the additional setup, the user must run: generate_analysis_conf.pyStores important paths and filenames in a JSON file for the following steps. 11_download_pdfs.pyDownloads all article PDFs to a folder for subsequent analysis. Topic Modeling Topic modeling is run through five different scripts: 11_topic_modeling_lvl1.pyReads the set of articles and generates a set of general topics. 11_topic_modeling_lvl2.pyGenerates more specific sub-topics from the initial set of topics. 11_topic_modeling_refine.pyMerges similar topics and removes overly specific or redundant topics that occur in less than 1% of the articles. 11_topic_modeling_assign.pyAssigns the generated topics to each article. 11_topic_modeling_correct.pyCorrects hallucinated topic assignments or errors. Task Assistant The task assistant module is run using 11_task_assistant.py. The user can add new prompts as text files under the folder specified in the analysis configuration and run the script to have an LLM execute the tasks for each article.

Related Organizations
Powered by OpenAIRE graph
Found an issue? Give us feedback