A Predictive Model to Identify Effective Metrics for the Comprehension of Computational Notebooks

Found an issue? Give us feedback

ZENODOarrow_drop_down

ZENODO

Article . 2023

License: CC BY

Data sources: Datacite

ZENODO

Conference object . 2023

License: CC BY

Data sources: ZENODO

A Predictive Model to Identify Effective Metrics for the Comprehension of Computational Notebooks

descriptionPublicationkeyboard_double_arrow_right Article , Conference object 08 Jul 2023Publisher:Zenodo

doi: 10.5281/zenodo.8126338

A Predictive Model to Identify Effective Metrics for the Comprehension of Computational Notebooks

- Summary
- Metrics

Abstract

we enhanced the measurement of the understandability level of notebook code by leveraging user comments within a software repository. As a case study, we started with 248,761 Kaggle Jupyter notebooks introduced in previous studies and their relevant metadata. To identify user comments associated with code comprehension within the notebooks, we utilized a fine-tuned DistillBERT transformer. We established a social-based criterion for measuring code understandability by considering the number of comments, their upvotes, the total views, and the total upvotes of the notebooks. This criterion has proven to be more effective than alternative methods, making it the ground truth for evaluating the code comprehension of our notebook set. In addition, we collected a total of 34 metrics for the notebooks, categorized as script-based and notebook-based metrics. These metrics were utilized as features in our dataset. Using the Random Forest classifier, our predictive model achieved 85% accuracy in predicting code comprehension levels in computational notebooks, identifying developer expertise and markdown facility utilization as key factors.

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average