
This dataset contains metrics extracted from approximately 2 million Scratch projects sourced from the Scratch online community. It was compiled for the study "Code Cloning and Procedural Abstraction in Scratch Projects" (Hidalgo-Aragon & Robles, 2026). Each row corresponds to one Scratch project and includes: - Project metadata (title, author, creation date, remix lineage)- Computational Thinking (CT) mastery scores across 7 dimensions (Abstraction, Parallelization, Logic, Synchronization, Flow Control, User Interactivity, Data Representation, plus Math and Motion operators)- Overall CT competence level (Basic / Developing / Advanced / Master)- Clone-related metrics: duplicate scripts count, dead code count- Code quality indicators: sprite naming, backdrop naming, number of sprites, presence of blocks- Remix parent and root identifiers for lineage analysis The data enables large-scale analysis of code cloning practices, procedural abstraction through custom blocks, and their relationship with CT skill development in block-based programming environments. File format: CSV (UTF-8), approximately 2041168 million rows, 28 columns, ~340 MB.
