Python Code and Dataset for  An Empirical Study of ChatGPT-4o Use in Engineering Education: Prompting and Performance

Python Code Overview (By File Order) This section describes the functionality of each script in the order it appears in the Github repository: SC - compute_structural_complexity.txtCalculates the grammatical and syntactic complexity of student writing. RU - compute_semantic_novelty.txtMeasures how semantically novel student responses are compared to typical AI outputs. RU - compute_response_utility.txtAssesses the helpfulness and task alignment of AI responses. RU - compute_contribution_final.txtCombines multiple utility-related dimensions to assess final AI contribution to the assignment. RU - compute_conceptual_transformation.txtEvaluates how well students transformed AI responses conceptually rather than copying them directly. README.mdThis document—repository description and usage instructions. QE - compute_query_efficiency.txtCalculates how efficiently students obtain useful answers relative to prompt count. QD - compute_query_depth.txtAggregates lexical, structural, and logical depth of student prompts. QD - compute_multistep_depth.txtScores prompts based on presence of multi-step reasoning or layered structure. QD - compute_lexical_structure.txtEvaluates lexical variety and formal characteristics of student prompts. QD - compute_focus_clarity.txtMeasures how focused, goal-oriented, and unambiguous the prompts are. PS - compute_stepwise_alignment.txtAssesses whether student work reflects logical integration of AI-generated insights. PS - compute_problem_solving_score.txtFinal score summarizing how well students used AI for analytical or problem-solving tasks. PS - compute_independent_expansion.txtChecks how much the student expanded upon or added new ideas beyond AI responses. PS - compute_conceptual_application.txtMeasures how students applied AI suggestions within a relevant engineering context. PRD - compute_prompt_refinement_depth.txtTracks iterative prompt modifications and semantic improvement. Interface Python Code.txtThe local interface tool students used to interact with ChatGPT. Logs prompts/responses and emails data to the researcher. Final Dataset - final_export_to_excel.txtExports all computed metrics and metadata into a final .xlsx file for analysis. Data merging - convert_grades.txtConverts raw Excel grade sheets into structured data. Data Merging - parse_ai_logs.txtExtracts prompts and responses from raw AI logs, removes duplicates, and formats them. Data Merging - merge all data.txtMerges all student data: logs, assignments, grades, and computed metrics. Data Merging - convert_assignments.txtConverts student .docx assignments to JSON with clean, tokenized text. CR - compute_content_richness.txtAssesses conceptual density and information content in student submissions. ARR - compute_text_similarity.txtCalculates similarity between student work and AI responses at the lexical level. ARR - compute_structural_similarity.txtEvaluates structural overlaps between student output and AI output. ARR - compute_query_submission_link.txtLinks submitted work back to the queries that most influenced it. ARR - compute_prompt_response_consistency.txtMeasures how logically aligned the student’s prompt is with the AI's response. ARR - compute_copy_paste_score.txtDetects copied or lightly modified content from AI responses. ARR - compute_ai_response_reliance.txtAggregates all ARR metrics into a single AI Reliance Score.

Dataset Notes The file An Empirical Study of ChatGPT-4o Use in Engineering Education Prompting and Performance.xlsx contains the full dataset used for the paper "An Empirical Study of ChatGPT-4o Use in Engineering Education: Prompting and Performance." Each row represents a student's session in a weekly engineering class, capturing both behavioral and performance data. Columns in the dataset: Student ID – An anonymized unique identifier for each student (e.g., Systems Engineering_1). Course – The name of the engineering course the student was enrolled in (e.g., Systems Engineering, Environmental Engineering). Week – Indicates which week of the 16-week semester the session corresponds to. Attendance – Binary indicator of whether the student was present (1) or absent (0) for the session. AI Access – Binary indicator of whether the student had access to ChatGPT-4o during the session (1 for access, 0 for no access). Activity Type – The type of task completed during the session. This could be Case Study Analysis, Engineering Design Report, Multi-Step Engineering Problem-Solving, or Experimental Data Analysis. AI Query Count – The number of prompts the student submitted to ChatGPT during the session. AI Response Reliance Score – A percentage score representing how much of the student’s submitted work related to AI output Assignment Score – The grade (out of 100) the student received for their work that week. AI Query Depth & Structure Score – A custom metric evaluating how complex, well-structured, and thoughtful the student’s prompts were. AI Query Efficiency Score – A metric indicating how efficiently students got useful responses using fewer and more focused queries. Prompt Refinement Depth Score – Captures how much the student iteratively refined or improved their prompts to get better responses. AI Response Complexity Score – Measures the linguistic and syntactic sophistication of ChatGPT’s responses. AI Response Utility Score – Evaluates the usefulness and relevance of the AI responses to the assigned task. AI-Driven Problem-Solving Score – A composite metric assessing how well the student integrated ChatGPT-generated content into their actual solution. Structural Complexity Score – Measures the structural sophistication of the student’s written work (e.g., use of transitions, logical flow). Content Richness Score – Assesses how information-dense and conceptually rich the student’s final submission was. Each row is a unique session, meaning a single student will appear multiple times (once per week, assuming attendance), with different metrics depending on whether they had AI access and how they used it.

Repository Overview This repository contains the complete Python codebase and dataset used in the study: An Empirical Study of ChatGPT-4o Use in Engineering Education: Prompting and Performance The project investigates the relationship between AI prompting behaviors and academic performance among engineering students using ChatGPT-4o. The GitHub repository includes: Data preprocessing scripts Metric calculation modules Machine learning models and analysis pipelines Code for figure generation and statistical tests The full codebase is available at: https://github.com/lisaza88/An-Empirical-Research-Study-of-ChatGPT-4o-Use-in-Engineering-Education Dataset This Zenodo record also includes the original dataset: An Empirical Study of ChatGPT-4o Use in Engineering Education Prompting and Performance.xlsx This file contains anonymized data used in the study, including: AI interaction logs (prompts, responses, timestamps) Written student submissions Assignment grades Computed metrics (e.g., structural complexity, content richness, query efficiency) The dataset is shared in accordance with ethical and privacy standards and is intended for reproducibility and academic reuse.

Related Organizations

National University of Distance Education
Spain

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average