Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
versions View all 3 versions
addClaim

[Supplementary material] Meta-Fair: AI-Assisted Fairness Testing of Large Language Models

Authors: Romero-Arjona, Miguel; Parejo, José A.; Alonso, Juan C.; Sánchez, Ana B.; Arrieta, Aitor; Segura, Sergio;

[Supplementary material] Meta-Fair: AI-Assisted Fairness Testing of Large Language Models

Abstract

This repository contains supplementary material for the paper Meta-Fair: AI-Assisted Fairness Testing of Large Language Models. It includes data, scripts, and resources to reproduce and analyse the experiments described in the paper. The contents are organised into four main folders: data/: Contains data for metamorphic tests, structured by research question: attributes_catalogue.csv: Catalogue of attributes used in metamorphic test generation. Subfolders for each research question (rq1/, rq2/, rq3/) include: generation/: CSV files of generated data for each metamorphic relation. execution/: CSV files representing the execution results of the metamorphic tests. evaluation/: Evaluation-related data, such as experiments and judgements. manual_revision/: Manually revised data provided by human judges. experimental_setup/: Provides the setup required to generate, execute, and evaluate the metamorphic tests: configuration/: JSON files (metamorphic_relations.json, rq1.json, rq2.json, rq3.json) defining configurations for each research question. jobs/: Subfolders (rq1/, rq2/, rq3/) containing job configurations for task execution. scripts/: Python scripts (evaluation.py, execution.py, generation.py, experiment.py) to automate generation, execution, and evaluation processes. tools/: Source code of the three developed tools for LLM-assisted generation (MUSE), execution (GENIE), and evaluation (GUARD-ME). requirements.txt: Lists dependencies needed to run the experiments. prompt_templates/: Includes the prompt templates used in the generation and evaluation of the metamorphic tests: base_generation.txt: The base prompt template for generation tasks. base_evaluation.txt: The base prompt template for evaluation tasks. generation_derivates/: Prompt templates (i.e., dual_attributes.txt, hypothetical_scenario.txt, metal.txt, multiple_choice.txt, prioritisation.txt, proper_nouns.txt, ranked_list.txt, score.txt, sentence_completion.txt, single_attribute.txt, yes_no_question.txt), derived from base_generation.txt, employed to generate metamorphic tests using different strategies. evaluation_derivates/: Prompt templates (i.e., attribute_comparison.txt, inverted_consistency.txt, proper_nouns_comparison.txt), derived from base_evaluation.txt, used to guide the judge model across different evaluation methods. analysis/: Contains resources for analysing and visualising the results of the experiments: results_analysis.ipynb: A Jupyter notebook used for data analysis, visualisation, and supplementary experiments. It enables interactive result analysis and supports reproducibility. requirements.txt: Lists the dependencies required to run the notebook environment. outputs/: Figures (figures/), tables (tables/) and statistical tests (statistical_tests/) that summarise the experimental findings.

Related Organizations
  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average