Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2026
License: CC BY
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2026
License: CC BY
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2026
License: CC BY
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2026
License: CC BY
Data sources: ZENODO
ZENODO
Dataset . 2026
License: CC BY
Data sources: Datacite
ZENODO
Dataset . 2026
License: CC BY
Data sources: Datacite
ZENODO
Dataset . 2026
License: CC BY
Data sources: Datacite
ZENODO
Dataset . 2026
License: CC BY
Data sources: Datacite
ZENODO
Dataset . 2026
License: CC BY
Data sources: Datacite
versions View all 5 versions
addClaim

Intuitive datasets: five-level data abstraction transformations

Authors: Sarazin, Arthur; Mourey, Mathis;

Intuitive datasets: five-level data abstraction transformations

Abstract

Intuitive Datasets: Five-Level Data Abstraction Transformations Overview This dataset collection demonstrates systematic transformations of open datasets across five levels of abstraction (L4→L0→L3), enabling users with different data literacy levels to access and understand complex data. The transformations implement a meta-design framework for creating "intuitive datasets" that adapt their complexity to user needs. Citation: If you use these datasets, please cite: 10.5281/zenodo.18174814 License: CC-BY-4.0 (Creative Commons Attribution 4.0 International) Related Code : https://github.com/ArthurSrz/intuitiveness Datasets Included This collection contains three complete dataset transformation cycles from the French open data platform (data.gouv.fr): test0_schools: French middle school performance indicators and student enrollment data test1_ademe: ADEME (French environmental agency) funding allocations test2_energy: Energy price data for gas tariffs in France Each dataset includes: raw/: iriginal L4 files (unlinkable multi-level datasets from data.gouv.fr) descent/: transformed files through L3 (linkable datasets), L2 (categorized table), L1 (feature vector), and L0 (atomic datum) ascent/: reconstructed datasets from L0 back to L3 with added analytic dimensions metadata/: transformation metadata, session exports, and join specifications Five-level abstraction framework The framework defines five levels of data abstraction: Level 4 (L4): Unlinkable multi-level datasets - Multiple disconnected CSV files with no apparent structure Level 3 (L3): Linkable multi-level datasets - Files connected through relationships, forming knowledge graphs Level 2 (L2): Single dataset with multiple entities and attributes - Categorized or filtered tables Level 1 (L1): Single entity or single attribute - Feature vectors or entity profiles Level 0 (L0): Atomic datum - Single entity-attribute-value triplet (e.g., "average school score: 12.5") Descent Phase (L4→L0) The descent progressively reduces complexity: L4→L3: Entity discovery and relationship detection to link disconnected files L3→L2: Domain isolation through semantic categorization L2→L1: Feature extraction to create vectors L1→L0: Aggregation to derive atomic metrics Ascent Phase (L0→L3) The ascent intentionally reconstructs complexity: L0→L1: Expand datum to feature vector with related attributes L1→L2: Add categorical dimensions (e.g., high/low performance) L2→L3: Add analytic dimensions to create multi-level structures File naming convention All files follow the pattern: `{dataset}_{level}_{description}.{ext}` Examples: test0_schools_L4_fr-en-college-effectifs-niveau-sexe-lv.csv - Original L4 raw file test0_schools_L3_joined_table.csv - Joined table at L3 test0_schools_L0_datum.json - Atomic datum at L0 test0_schools_ascent_L3_table.csv - Reconstructed L3 table during ascent Data Sources All datasets originate from data.gouv.fr, France's national open data platform: test0_schools : - College enrollment by level, gender, and language : https://www.data.gouv.fr/datasets/effectifs-deleves-par-niveau-sexe-langues-vivantes-1-et-2-les-plus-frequentes-par-college-date-dobservation-au-debut-du-mois-doctobre-chaque-annee - Middle school performance indicators : https://www.data.gouv.fr/datasets/indicateurs-de-valeur-ajoutee-des-colleges test1_ademe : - ADEME financial aid allocations : https://www.data.gouv.fr/datasets/les-aides-financieres-de-lademe-1 - ADEME list of funded projects : https://www.data.gouv.fr/datasets/couts-des-travaux-de-renovation-ecs test2_energy : - Regulated gas tariff price levels : https://www.data.gouv.fr/datasets/niveaux-de-prix-par-commune-pour-les-tarifs-reglementes-de-vente-de-gaz-naturel-dengie - French energy import/export : https://www.data.gouv.fr/datasets/imports-et-exports-commerciaux-2005-a-2021 Transformation methodology Transformations were performed using the `intuitiveness` Python package (v0.1.0) with the following dependencies: Python 3.11 pandas 2.x networkx 3.x sentence-transformers (multilingual-e5-small model) For detailed transformation logic, see the session export files in each dataset's `metadata/` folder. Reuse examples For data scientists Test data transformation algorithms across different complexity levels Benchmark complexity reduction metrics Validate semantic domain matching techniques Train machine learning models on multi-level data structures For open data platforms Implement multi-level data access features Design adaptive interfaces for users with varying data literacy Test complexity-aware search and navigation For educators Teach data literacy concepts through concrete examples Demonstrate descent-ascent transformation cycles Illustrate complexity management principles For researchers Study how data structure affects user comprehension Analyze relationship discovery patterns in open datasets Investigate semantic categorization effectiveness across domains Contact For questions, issues, or suggestions: arthur.sarazin@etu-iepg.fr

Related Organizations
Keywords

Information Literacy

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average