Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ MAnnheim DOCument Se...arrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Other literature type . 2025
License: CC BY
Data sources: ZENODO
ZENODO
Presentation . 2025
License: CC BY
Data sources: Datacite
ZENODO
Presentation . 2025
License: CC BY
Data sources: Datacite
versions View all 3 versions
addClaim

Perplexity-inspired metasearch-based alternatives to FAIR GPT: Open-source AI consultants for RDM

Authors: Schmidt, Thomas; Shigapov, Renat; Schumm, Irene; Kamlah, Jan;

Perplexity-inspired metasearch-based alternatives to FAIR GPT: Open-source AI consultants for RDM

Abstract

FAIR GPT was recently proposed as a virtual consultant for research data management (RDM) designed to help researchers and institutions in making their data FAIR (Findable, Accessible, Interoperable, Reusable). To reduce hallucinations and improve accuracy for certain tasks, FAIR GPT uses external APIs (FAIR-Checker, FAIR Enough, TIB Terminology, and re3data) and uploaded RDM resources (Horizon 2020 guidelines and the awesome-RDM GitHub repository). Its functionalities include metadata enhancement, dataset organization, repository selection, FAIRness assessment, license recommendations, and generating documentation such as data management plans, README files, and codebooks. However, FAIR GPT has limitations. It does not provide sources for its answers, which reduces transparency and trust in its outputs. As part of OpenAI's "Custom GPTs", FAIR GPT is not open source, which limits customization, and it lacks an API for integration into existing RDM workflows. Reliance on external cloud-based services leads to privacy concerns when dealing with sensitive (meta)data. These issues led us to explore alternative open-source solutions. We specifically searched for open-source alternatives to Perplexity AI, a system known for its ability to provide citations for the information it retrieves. We identified three candidates available on GitHub: Perplexica, sensei, and farfalle. These tools use local instances of SearXNG to perform internet search, using the results as contextual input for large language models (LLMs). We modified each of these tools to focus specifically on RDM tasks, releasing the new versions on GitHub openly under the names FAIR-Perplexica, FAIR-sensei and FAIR-farfalle. We conducted a comparative analysis of these open-source candidates against each other and FAIR GPT, including (but not limited to) the following criteria: 1. Provenance. Unlike FAIR GPT, all three tools provide clear links to the sources of their search results, which improves transparency and trust in their outputs. 2. Privacy. While these tools are designed to run locally, they also send requests to the internet for information retrieval, which leads to privacy concerns. 3. Up-to-dateness. FAIR GPT partly relies on a static knowledge base, which may become outdated. The other tools use internet searches that contain more up-to-date, RDM-specific information. 4. Customizability. The open-source nature of the new tools allows users to customize them according to their specific RDM needs, which contrasts with FAIR GPT. 5. Ease of installation and use. All tools are straightforward to install using Docker Compose, and they offer intuitive, user-friendly graphical interfaces. 6. Community support. Open-source tools benefit from upstream development and a community of contributors. 7. Accuracy and Completeness. Each tool's responses were evaluated for missing information and potential errors. 8. Performance. Due to the varying pre- and post-processing steps involved in each tool, their overall performance differs. In this work, we introduce and compare the open-source solutions FAIR-Perplexica, FAIR-sensei, and FAIR-farfalle as alternatives to FAIR GPT. These tools are designed for users who prioritize transparency, customization, and control over their (meta)data workflows. However, these tools involve sending requests to the internet via metasearch engine SearXNG, which may lead to privacy challenges.

Lightning Talk presentation held on 14 March 2025 during the E-Science-Tage conference in Heidelberg, Germany.

Country
Germany
Related Organizations
Keywords

LLM, 020, FAIR data, AI assistant, FAIR GPT, RDM, research data management, chat bots

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Green