
This record is Part 2 of the data associated with the publication:Janež, Škrlj, Osojnik et al. MicroICS: Extracting predictive information from structural features of Listeria monocytogenes biofilms for strain identification and biological context associations. This part provides data supporting strain characterisation, classifier performance, and inference. It includes custom protein databases used for biomarker identification, metadata on the strains, visual inspection results that served as the baseline for algorithm performance comparison, and feature files and images used for inference on biofilms perturbed by food residues. datafile_17_11_2023_3D_z_21_with_exp_ctrl_results – Feature calculation results for the training dataset 17_11_2023_3D_z_21_with_exp_ctrl (dataset 17_11_2023_3D_z_21 with experimental controls substituted). Used to train the random forest classifier and to perform inference on biofilm images with altered structure in E8_images_for_prediction. E8_images_for_prediction – Images used for inference testing, comprising biofilms treated with food extracts and their corresponding controls. metadata_on_Listeria_strains_used – Epidemiological, sequencing, and genomic analysis data for the strains used in this study. biofilm_associated_proteins.fasta – Custom database of biofilm-associated proteins used with BLAST to identify homologues in the genomes of the selected strains. wall_teichoic_acid_synthesis_associated_proteins.fasta – Custom database of wall teichoic acid synthesis-associated proteins used with BLAST to identify homologues in the genomes of the selected strains. Prediction_set_shuffled, training_set – Selected images used for visual classification visual_classification_results – Results of the visual classification of biofilm images conducted by three laboratory members.
