Perplexity-inspired metasearch-based alternatives to FAIR GPT: Open-source AI consultants for RDM

descriptionPublicationkeyboard_double_arrow_right Presentation , Other literature type , Conference object 01 Jan 2025 Germany English Publisher:Zenodo

Authors: Schmidt, Thomas; Shigapov, Renat; Schumm, Irene; Kamlah, Jan;

doi: 10.5281/zenodo.15038486 , 10.5281/zenodo.15038485

Perplexity-inspired metasearch-based alternatives to FAIR GPT: Open-source AI consultants for RDM

- Summary
- Subjects
- Metrics

Abstract

FAIR GPT was recently proposed as a virtual consultant for research data management (RDM) designed to help researchers and institutions in making their data FAIR (Findable, Accessible, Interoperable, Reusable). To reduce hallucinations and improve accuracy for certain tasks, FAIR GPT uses external APIs (FAIR-Checker, FAIR Enough, TIB Terminology, and re3data) and uploaded RDM resources (Horizon 2020 guidelines and the awesome-RDM GitHub repository). Its functionalities include metadata enhancement, dataset organization, repository selection, FAIRness assessment, license recommendations, and generating documentation such as data management plans, README files, and codebooks. However, FAIR GPT has limitations. It does not provide sources for its answers, which reduces transparency and trust in its outputs. As part of OpenAI's "Custom GPTs", FAIR GPT is not open source, which limits customization, and it lacks an API for integration into existing RDM workflows. Reliance on external cloud-based services leads to privacy concerns when dealing with sensitive (meta)data. These issues led us to explore alternative open-source solutions. We specifically searched for open-source alternatives to Perplexity AI, a system known for its ability to provide citations for the information it retrieves. We identified three candidates available on GitHub: Perplexica, sensei, and farfalle. These tools use local instances of SearXNG to perform internet search, using the results as contextual input for large language models (LLMs). We modified each of these tools to focus specifically on RDM tasks, releasing the new versions on GitHub openly under the names FAIR-Perplexica, FAIR-sensei and FAIR-farfalle. We conducted a comparative analysis of these open-source candidates against each other and FAIR GPT, including (but not limited to) the following criteria: 1. Provenance. Unlike FAIR GPT, all three tools provide clear links to the sources of their search results, which improves transparency and trust in their outputs. 2. Privacy. While these tools are designed to run locally, they also send requests to the internet for information retrieval, which leads to privacy concerns. 3. Up-to-dateness. FAIR GPT partly relies on a static knowledge base, which may become outdated. The other tools use internet searches that contain more up-to-date, RDM-specific information. 4. Customizability. The open-source nature of the new tools allows users to customize them according to their specific RDM needs, which contrasts with FAIR GPT. 5. Ease of installation and use. All tools are straightforward to install using Docker Compose, and they offer intuitive, user-friendly graphical interfaces. 6. Community support. Open-source tools benefit from upstream development and a community of contributors. 7. Accuracy and Completeness. Each tool's responses were evaluated for missing information and potential errors. 8. Performance. Due to the varying pre- and post-processing steps involved in each tool, their overall performance differs. In this work, we introduce and compare the open-source solutions FAIR-Perplexica, FAIR-sensei, and FAIR-farfalle as alternatives to FAIR GPT. These tools are designed for users who prioritize transparency, customization, and control over their (meta)data workflows. However, these tools involve sending requests to the internet via metasearch engine SearXNG, which may lead to privacy challenges.

Lightning Talk presentation held on 14 March 2025 during the E-Science-Tage conference in Heidelberg, Germany.

Country

Germany

Related Organizations

University of Mannheim
Germany

Keywords

LLM, 020, FAIR data, AI assistant, FAIR GPT, RDM, research data management, chat bots

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green