Powered by OpenAIRE graph
Found an issue? Give us feedback
ZENODOarrow_drop_down
ZENODO
Dataset . 2026
License: CC BY
Data sources: Datacite
ZENODO
Dataset . 2026
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

EULingDiv

Authors: Essfors, Hannes;
Abstract

Altough the European Union is commited to linguistic diversity by recognizing 24 offical EU-languages and actively promiting multilingualism, there have not been much effort vested into quantitatively asessing the linguistic diversity of the union. However, two eurobarometer surveys have been conducted that tangentially surves this purpose: one in 2012, and one in 2024. In the surveys, question are posed pertaining to the native language, and first to third other language, which we interpret as corresponding to L1 to L4, thus potentially allowing for more accruate models of linguistic diversity that account for multilingualism. To allow for easier anlysis of the data, we have merged and structured the data pertaining to spoken language across the surveys into the dataframe EU_country_speakers_2012_2024.csv. The dataframe is structured in a long format with country-language-year-group-speakers pentuplets. We have not converted the numbers into proportions and derived any formal measures, since there are many analysis-specific aspects necessary to account for, e.g. the weighting given to L1 contra L2, L3, etc. Furthermore, the survey is sample based, and by summing the L1-speakers of each country, one arrives at the sample size. As such, consideration needs to be made regarding the uncertainty of potential diversity indices derived from the data. While we have added ISO-codes to make the dataset more interoperable, one should be wary of the presence of macrolanguages. For example, the survey designates speakers of "Arabic" and "Albanian", but does not specify the varieties such as Tosk or Gheg Albanian. We have kept with this and added the macroidentifyer of "ara" and "sqi" in such cases. Furthermore, the languages included in the survey of 2012 and 2024 does not necessarily fully overlap due to how the surveys were designed per design of the survey constructors. The dataset contains the following columns: country_code: 2-letter ISO-3166 code denoting the country that was surveyed. (character) country_name: Commonly used country names corresponding to the country code - does not follow any particular standards. (character) ISO6393: 3-letter ISO639-3 code denoting the language that was surveyed. (character) Language name: Language name used by the original surveys. (character) Speaker_type: Categorical variable denoting if the language is the first (L1), second (L2), third (L3) or fourth (L4) language of the speakers. ) number_of_speakers: Variable denoting the number of speaker of a variety in a given country according to a given type. (double) year: Denotes which year and survey the data is sourced from

Keywords

Linguistics/statistics & numerical data, Linguistic Diversity, European Union/statistics & numerical data, FOS: Languages and literature, Linguistics, Survey

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average