Powered by OpenAIRE graph
Found an issue? Give us feedback
ZENODOarrow_drop_down
ZENODO
Data Paper . 2025
License: CC BY
Data sources: Datacite
ZENODO
Data Paper . 2024
License: CC BY
Data sources: Datacite
ZENODO
Data Paper . 2025
License: CC BY
Data sources: Datacite
versions View all 3 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

LLM4DS-Benchmark: A Dataset for Assessing LLM Performance in Data Science Coding Tasks

Authors: Boominathan, Santhosh Anitha; Chintakunta, Sai Sanjna; Nascimento, Nathalia; Everton, Guimaraes;

LLM4DS-Benchmark: A Dataset for Assessing LLM Performance in Data Science Coding Tasks

Abstract

LLM4DS-Benchmark Dataset Description The LLM4DS-Benchmark dataset is a resource designed to evaluate the performance of Large Language Models (LLMs) on data science coding tasks. It was developed as part of the research presented in the paper “Empirical Benchmarking of Large Language Models for Data Science Coding: Accuracy, Efficiency, and Limitations.” This new version of the dataset includes: Prompt templates for various types of problems. Problem IDs with associated metadata and reference links. Official solution code extracted from Stratascratch demo solutions, along with the corresponding generated code for successful LLM outputs. LLM4DS-Execution-Results.xlsx: A comprehensive spreadsheet listing the selected problems and their execution results. Similarity scores comparing the platform-provided official solutions and the generated code. Dataset Contents 1. Prompt Templates (prompt-templates/) • This folder contains the prompt templates used for three types of problems: algorithm, analytical, and visualization. These prompts were used to automate the generation of the problems listed in the .json files to the prompt format. 2. Problem Metadata (problems-id/) • Each easy.json, medium.json, and hard.json files organize the selected problems by difficulty and contain metadata for the selected problems, including: • ID: Unique identifier for the problem. • Link: Direct URL to the problem on the StrataScratch platform. • Type: Problem category (algorithm, analytical, or visualization). • Topics: Main topics associated with the problem. • Public Problem Descriptions: While the problems are publicly available on the StrataScratch platform, we have omitted full problem descriptions from our repository. Instead, we provide the problem IDs and direct links to the StrataScratch website, ensuring compliance with their terms of service. 3. Official and Generated Code Solutions (official-and-generated-code/) • This folder contains the official solution code extracted from Stratascratch demo solutions, along with the corresponding generated code for successful LLM outputs. It is organized as follows: • Categories: Subfolders for algorithm, analytical, and visualization problems. • Difficulty Levels: Each category contains subfolders for easy, medium, and hard problems. • Problem IDs: Solutions for individual problems are stored in subfolders named after their problem IDs. • File Format: Solutions are saved as .py files. 4. Similarity Computation (similarity-computation/) Compares the official solution code extracted from Stratascratch with the code generated by LLMs, using similarity metrics. 5. Execution Results (LLM4DS-Execution-Results.xlsx) • This Excel file provides a detailed summary of the dataset and the evaluation results. It includes the following sheets: - Selected Problems: Metadata for the 100 selected problems, including: • Topics: Main topics covered by each question. • Reasoning: Why the problem was selected. • Company: The company that originally used the problem. - Copilot-Results, ChatGPT-Results, Perplexity-Results, Claude-Results, and GitHub Copilot: Performance results for each LLM on 100 data science problems, including the number of trials and similarity scores. EXTRA: By uploading this spreadsheet to Google Colab, you can reproduce all analytics results reported in the paper: https://colab.research.google.com/drive/1zmu2DUYkEj5oD5CHIHRT6UOQtZhZsgQW?usp=sharing Code for converting Stratascratch problems to prompts using our prompt templates: https://github.com/ABSanthosh/RA-Week-3-work For further details, refer to the linked paper.

Related Organizations
Keywords

LLM4DS

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Upload OA version
Are you the author of this publication? Upload your Open Access version to Zenodo!
It’s fast and easy, just two clicks!