<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>

COPY SCRIPT

For further information contact us at helpdesk@openaire.eu

Training data for the shared task Ideology and Power Identification in Parliamentary Debates (2025)

Research datakeyboard_double_arrow_right Dataset 04 Jan 2025Publisher:Zenodo

Authors: Çöltekin, Çağrı; Kopp, Matyáš; Morkevičius, Vaidas; Ljubešić, Nikola; Meden, Katja; Erjavec, Tomaž;

doi: 10.5281/zenodo.15064421 , 10.5281/zenodo.14600018 , 10.5281/zenodo.14600017

Training data for the shared task Ideology and Power Identification in Parliamentary Debates (2025)

- Summary
- Metrics

Abstract

This dataset contains a selection of speeches from ParlaMint corpora (version 4.1) as the training set for the shared task on "Ideology and Power Identification in Parliamentary Debates" in CLEF 2025. All files are tab-separated text files with the following fields: "id" is a unique (arbitrary) ID for each text. "speaker" is a unique (arbitrary) ID for each speaker. There may be multiple speeches from the same speaker. "sex" is the (binary/biological) sex of the speaker. This information is collected from varying sources (typically data published by the respective parliament), and in some cases it may be unspecified or unknown. "text" is the transcribed text of the parliamentary speech. Real examples may include line breaks, and other special sequences escaped or quoted. "text_en" is an automatic English translation of the corresponding text. This field may be empty (obviously) for speeches in English, but the translations may be missing for a small number of non-English speeches as well. "orientation" is the binary/numeric label ( 0 is left and 1 is right). Orientation labels are based on Wikipedia. "power" is the binary label for power role (0 is opposition, 1 is coalition), this information is based on the information provided by the ParlaMint contributors. This value is not always present, either due to parliamentary systems with no defined coalition/opposition, or unknown orientation information for some speakers (e.g., PMs with no party affilitiation). Missing values are indicated as 'NA'. "populism" is a populism index based on multiple expert surveys (to increase the coverage). We focus on a particular dimension of populism in this task: the position of the party of the speaker in populist - pluralist spectrum. This is measured on a 4-point ordinal scale (1: Strongly Pluralist, 2: Moderately Pluralist 3: Moderately Populist, 4: Strongly Populist). Not all values are present in all parliaments. Many parties/speakers are not covered by the data, and some values are missing due to failure to match the survey identifies/names and ParlaMint identifiers. Missing values are indicated as 'NA'. Small samples of the data files are provided in the shared task GitHub repository at https://github.com/coltekin/ideology-power-st-baseline. File names include a code for the parliament. We provide data from the following national and regional parliaments. Austria (at) Bosnia and Herzegovina (ba) Belgium (be) Bulgaria (bg) Czechia (cz) Denmark (dk) Estonia (ee) Spain (es) Catalonia (es-ct) Galicia (es-ga) Basque Country (es-pv) Finland (fi) France (fr) Great Britain (gb) Greece (gr) Croatia (hr) Hungary (hu) Iceland (is) Italy (it) Latvia (lv) The Netherlands (nl) Norway (no) Poland (pl) Portugal (pt) Serbia (rs) Sweden (se) Slovenia (si) Turkey (tr) Ukraine (ua) The number of training instances and the class imbalance differs for each training set. We do not provide a fixed validation split. Please see the shared task website and the GitHub repository for further description of the data set and the sampling process.

Related Organizations

University of Ljubljana
Slovenia
University of Tübingen
Germany
Charles University
Czech Republic
Research Centre of the Slovenian Academy of Sciences and Arts (ZRC SAZU)
Slovenia
Institute of Contemporary History
Slovenia

View all View all

Impact byBIP!

	citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

Average

Related to Research communities

EUTOPIA Open Research Portal