Finding the right XAI Method --- Dataset

This dataset provides the complementary preprocessed data for the training of the neural networks used in Bommer et. al. and according source code (https://github.com/philine-bommer/Climate_X_Quantus). In the publication , we introduce XAI evaluation in the context of climate research and assess different desired explanation properties, namely, robustness, faithfulness, randomization, complexity, and localization. To this end we build upon previous work (Labe and Barnes et. al. 2021) and train a multi-layer perceptron (MLP) and a convolutional neural network (CNN) to predict the decade based on annual-mean temperature maps. Following Labe and Barnes et. al. 2021, we use data simulated by the general climate model, CESM1 (Hurrell et. al. 2013). We use the global 2-m air temperature (T2m) temperature maps from 1920 to 2080. The data consist of 40 ensemble members and each member is generated by varying the atmospheric initial conditions with fixed external forcing, i.e. historical forcings are imposed from 1920 to 2005 and Representative Concentration Pathways 8.5 for the following years (Kay et. al. 2015). Following Labe and Barnes et. al. 2021, we compute annual averages and apply a bilinear interpolation. This results in T=161 temperature maps for each member, with v=144 longitude grid cells and h=95 latitude grid cells, given the 1.9° sampling in latitude and 2.5° sampling in longitude. The temperature maps are finally standardized by removing the multi-year (1920 to 2080) mean and subsequently dividing by the corresponding standard deviation. Unlike the flattened input used for the MLP (temperature maps are flattened into a vector), the CNN maintains the longitude-latitude grid of the temperature maps. Similar to Labe and Barnes et. al. 2021, for training, validation and testing we use the model data discussed above. For both MLP and CNN we consider 20% of the data as test set and the remaining 80% is split into a training (64%) and validation (16%) set. We train both networks to solve a fuzzy classification problem which combines classification and regression. In the classification setting, the network assigns each map to one of the 20 different classes, where each class corresponds to one decade between 1900 and 2100 (necessary class devision for later regression, as done by Labe and Barnes et. al. 2021). The network output thus, is a probability vector containing a probability for each class. To assess the network performance we use the monthly 2m air temperature of the 20th century Reanalysis data (V3) (Slivinski et. al. 2019) from 1920 to 2015. The dataset includes two compressed .npz-files and a Readme.md. A full description of the data contained in this dataset and instructions on the data usage are provided in the Readme-file.

This work was funded by the German Ministry for Education and Research through project Explaining 4.0 (ref. 01IS200551). The authors also acknowledge the CESM Large Ensemble Community Project (Kay et. al. 2015) for making the data publicly available. Support for the Twentieth Century Reanalysis Project version 3 dataset is provided by the U.S. Department of Energy, Office of Science Biological and Environmental Research (BER), by the National Oceanic and Atmospheric Administration Climate Program Office, and by the NOAA Earth System Research Laboratory Physical Sciences Laboratory.

Related Organizations

University of Potsdam
Germany
Leipzig University
Germany
Technical University of Berlin
Germany
Berlin Institute for the Foundations of Learning and Data
Germany
University of Reading
United Kingdom

View all View all

Keywords

Deep Neural Networks, Deep Learning, XAI Evaluation, Explainability, Climate Science

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average