
Intuitive Datasets: Five-Level Data Abstraction Transformations Overview This dataset collection demonstrates systematic transformations of open datasets across five levels of abstraction (L4→L0→L3), enabling users with different data literacy levels to access and understand complex data. The transformations implement a meta-design framework for creating "intuitive datasets" that adapt their complexity to user needs. Citation: If you use these datasets, please cite: 10.5281/zenodo.18174814 License: CC-BY-4.0 (Creative Commons Attribution 4.0 International) Related Code : https://github.com/ArthurSrz/intuitiveness Datasets Included This collection contains three complete dataset transformation cycles from the French open data platform (data.gouv.fr): test0_schools: French middle school performance indicators and student enrollment data test1_ademe: ADEME (French environmental agency) funding allocations test2_energy: Energy price data for gas tariffs in France Each dataset includes: raw/: iriginal L4 files (unlinkable multi-level datasets from data.gouv.fr) descent/: transformed files through L3 (linkable datasets), L2 (categorized table), L1 (feature vector), and L0 (atomic datum) ascent/: reconstructed datasets from L0 back to L3 with added analytic dimensions metadata/: transformation metadata, session exports, and join specifications Five-level abstraction framework The framework defines five levels of data abstraction: Level 4 (L4): Unlinkable multi-level datasets - Multiple disconnected CSV files with no apparent structure Level 3 (L3): Linkable multi-level datasets - Files connected through relationships, forming knowledge graphs Level 2 (L2): Single dataset with multiple entities and attributes - Categorized or filtered tables Level 1 (L1): Single entity or single attribute - Feature vectors or entity profiles Level 0 (L0): Atomic datum - Single entity-attribute-value triplet (e.g., "average school score: 12.5") Descent Phase (L4→L0) The descent progressively reduces complexity: L4→L3: Entity discovery and relationship detection to link disconnected files L3→L2: Domain isolation through semantic categorization L2→L1: Feature extraction to create vectors L1→L0: Aggregation to derive atomic metrics Ascent Phase (L0→L3) The ascent intentionally reconstructs complexity: L0→L1: Expand datum to feature vector with related attributes L1→L2: Add categorical dimensions (e.g., high/low performance) L2→L3: Add analytic dimensions to create multi-level structures File naming convention All files follow the pattern: `{dataset}_{level}_{description}.{ext}` Examples: test0_schools_L4_fr-en-college-effectifs-niveau-sexe-lv.csv - Original L4 raw file test0_schools_L3_joined_table.csv - Joined table at L3 test0_schools_L0_datum.json - Atomic datum at L0 test0_schools_ascent_L3_table.csv - Reconstructed L3 table during ascent Data Sources All datasets originate from data.gouv.fr, France's national open data platform: test0_schools : - College enrollment by level, gender, and language : https://www.data.gouv.fr/datasets/effectifs-deleves-par-niveau-sexe-langues-vivantes-1-et-2-les-plus-frequentes-par-college-date-dobservation-au-debut-du-mois-doctobre-chaque-annee - Middle school performance indicators : https://www.data.gouv.fr/datasets/indicateurs-de-valeur-ajoutee-des-colleges test1_ademe : - ADEME financial aid allocations : https://www.data.gouv.fr/datasets/les-aides-financieres-de-lademe-1 - ADEME list of funded projects : https://www.data.gouv.fr/datasets/couts-des-travaux-de-renovation-ecs test2_energy : - Regulated gas tariff price levels : https://www.data.gouv.fr/datasets/niveaux-de-prix-par-commune-pour-les-tarifs-reglementes-de-vente-de-gaz-naturel-dengie - French energy import/export : https://www.data.gouv.fr/datasets/imports-et-exports-commerciaux-2005-a-2021 Transformation methodology Transformations were performed using the `intuitiveness` Python package (v0.1.0) with the following dependencies: Python 3.11 pandas 2.x networkx 3.x sentence-transformers (multilingual-e5-small model) For detailed transformation logic, see the session export files in each dataset's `metadata/` folder. Reuse examples For data scientists Test data transformation algorithms across different complexity levels Benchmark complexity reduction metrics Validate semantic domain matching techniques Train machine learning models on multi-level data structures For open data platforms Implement multi-level data access features Design adaptive interfaces for users with varying data literacy Test complexity-aware search and navigation For educators Teach data literacy concepts through concrete examples Demonstrate descent-ascent transformation cycles Illustrate complexity management principles For researchers Study how data structure affects user comprehension Analyze relationship discovery patterns in open datasets Investigate semantic categorization effectiveness across domains Contact For questions, issues, or suggestions: arthur.sarazin@etu-iepg.fr
Information Literacy
Information Literacy
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
