Multimodal Dataset for LD50 Toxicity Prediction of Pesticides Using Deep Learning

This dataset supports the study "Saving Mice: ChenseNet121, a New Deep Learning Architecture for LD50 Toxicity Estimation", and was specifically designed for training and evaluating multimodal deep learning models for acute oral toxicity (LD50) prediction in pesticides. It integrates multiple data representations for each compound: 2D images of molecular structures (folder: images/, PNG format), downloaded from PubChem and identified by compound CID. 3D voxelized volumes derived from molecular docking simulations against human acetylcholinesterase (hAChE, PDB: 7E3H), formatted as tensors and stored as .npy files (not shown in screenshot). Physicochemical descriptors, extracted from SMILES using RDKit, including molecular weight, logP, TPSA, number of rotatable bonds, and docking binding affinities. These are stored in plain text files: dataset_descriptores_bool.txt dataset_descriptores_float.txt dataset_descriptores_2x2x2_bool.txt dataset_descriptores_2x2x2_float.txt CSV files containing the integrated dataset (combined_dataset.csv) and a balanced test subset for classification tasks (balanced_test.csv). The dataset is aligned with EFSA guidelines and enables the training of machine learning models using image-based, structural, and biochemical features. It was used to develop and evaluate the ChenseNet121 architecture, which outperforms ResNet, Inception, and EfficientNet variants in LD50 regression and WHO-aligned toxicity classification. Suggested Citation:Junquera, E., Remeseiro, B., Febbraio, F., & Díaz, I. (2025). Multimodal Dataset for LD50 Toxicity Prediction of Pesticides Using Deep Learning. Zenodo. Related Publication:Junquera et al. (2025). Saving Mice: ChenseNet121, a New Deep Learning Architecture for LD50 Toxicity Estimation.

Related Organizations

National Research Council
Italy
University of Oviedo
Spain

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average