[LS2N_IPI]Concrete patch classification

concrete_patch_classification This dataset is based on the original dataset I3DCP introduced in Rill-García, R., Dokladalova, E., Dokládal, P., Caron, J.-F., Mesnil, R., Margerit, P., & Charrier, M. (2022). Inline monitoring of 3D concrete printing using computer vision. Additive Manufacturing, 60, 103175. https://doi.org/10.1016/j.addma.2022.103175 The original dataset includes raw images of cement-based material deposition, segmentation masks of interstitial lines, and texture classification patches. In particular, our work focuses on the texture classification patches. This dataset thus provides three complementary resources: A reorganized version of the original 111 patches with 5-fold splits. An extended set of 426 expert-annotated patches with an additional geometric defect class(Crushed in English, Écrasé in French). A collection of synthetic patches generated with StyleGAN3, covering all five classes. Sub-dataset 1: Original annotated texture windows Content: 111 labeled gray-leveled texture windows with fixed width 200 extracted from 24 raw images. 5-fold cross-validation Original classes: Fluid (24 images, proportion 21.62%) Good (27 images, proportion 24.32%) Dry (24 images, proportion 21.62%) Tearing (36 images, proportion 32.43%) Labels: texture_windows-labels.csv. Model weights fine-tuned in subdataset1 with synthetic images in subdataset3: Baseline model introduced by (Rill-García et al., 2022) , EfficientFormer model introduced by (Li et al., 2022) and proposed Multimodal Dual-Branch model. (pth: model weight, *.txt: normalization params for image, *.npy: normalization params for texture descripteur vector) Sub-dataset 2: Extended expert-annotated texture windows Content: 426 extended labeled gray-leveled texture windows with fixed width 200 extracted from 24 raw images. 5-fold cross-validation Classes: Fluid(84 images,proportion 19.72%) Good(127 images,proportion 29.81%) Dry(68 images,proportion 15.96%) Tearing(61 images,proportion 14.32%) Geometric defect Écrasé (French) / Crushed (English) (86 images, proportion 20.19%) Labels: patch_labels(426extension).csv Model weights fine-tuned in subdataset2 with synthetic images in subdataset3: Baseline model introduced by (Rill-García et al., 2022) , EfficientFormer model introduced by (Li et al., 2022) and proposed Multimodal Dual-Branch model.(pth: model weight, *.txt: normalization params for image, *.npy: normalization params for texture descripteur vector) Sub-dataset 3: Synthetic images (StyleGAN3 generated) Content: Synthetic gray-leveled texture windows generated by five separate pretrained generative models. Classes: Fluid(1200 images) Good(1200 images) Dry(1200 images) Tearing(1200 images) Geometric defect Écrasé (French) / Crushed (English)(1200 images) Labels: ./images_generees(d1)/patch_labels(426extension+stylegan3).csv for Sub-dataset2. ./images_generees(d2)/texture_windows-labels(stylegan3_d2).csv for Sub-dataset1. Model weights trained for generation: 4 category-specific model weights trained by StyleGAN3 (fluid, good, dry, tearing), each model can only generate one category. 1 category-jointly model weights trained by StyleGAN3, which generates 5 categories(fluid, good, dry ,tearing, ecrase/crushed) For specific dataset usage, please refer to the GitHub repository Updates (compared to Version 1.0.0) The models were re-trained under an updated training configuration, resulting in reduced overfitting compared to Version 1.0.0. In addition, the inference procedure has been upgraded from a single-model setup to a 5-model ensemble strategy based on logits averaging. Synthetic Data Extension (SubDataset3) The synthetic image dataset has been expanded. In Version 1.0.0, synthetic images were generated exclusively using a generator trained on Original dataset. In the current version, additional synthetic images generated by a generator trained on Re-annotated dataset have been included. The corresponding label CSV files are also provided to facilitate data augmentation during training. To avoid data leakage between datasets, a cross-dataset generation strategy is adopted: Synthetic images generated by the generator trained on the Original Dataset (Dataset1) are used exclusively for augmentation of the Re-annotated Dataset (Dataset2). Conversely, synthetic images generated by the generator trained on the Re-annotated Dataset (Dataset2) are used exclusively for augmentation of the Original Dataset (Dataset1). License This dataset is distributed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0). https://creativecommons.org/licenses/by-nc-sa/4.0/ It is derived from the I3DCP released under the same license (CC BY-NC-SA 4.0). Additional annotations and processing were created by us and are released under the same CC BY-NC-SA 4.0 license.

Found an issue? Give us feedback