A LiDAR-based Machine Vision Dataset for Online Volume Measurement of Sweetpotatoes

Dataset Structure```SP_3D_Dataset.zip├─ raw_PointClouds│ ├─ SampleID_001│ │ ├─ ...│ │ ├─ SampleID_001_FrameID_0xx.laz│ │ └─ ...│ ├─ ...│ └─ SampleID_200│├─ segmented_PointClouds│ ├─ SampleID_001│ │ ├─ ...│ │ ├─ SampleID_001_FrameID_0xx.laz│ │ └─ ...│ ├─ ...│ └─ SampleID_200│├─ selected_Images│ ├─ SampleID_001│ │ ├─ ...│ │ ├─ SampleID_001_FrameID_0xx.png│ │ └─ ...│ ├─ ...│ └─ SampleID_200│└─ sweetpotato_volume_ground-truth.xlsx```--- Dataset OrganizationWhen extracting the archive, the dataset is organized into three primary functional folders and one ground-truth file: **raw_PointClouds**: Contains raw 3D point cloud data (.laz format) directly acquired from the LiDAR sensor. **segmented_PointClouds**: Contains cleaned point cloud data after background removal and statistical denoising. **selected_Images**: Contains synchronized 2D RGB images (.png). **sweetpotato_volume_ground-truth.xlsx**: A spreadsheet containing physical reference measurements for all 200 samples. Directory HierarchyThe dataset follows a consistent hierarchical structure across all directories: Each primary folder contains 200 subfolders (labeled `SampleID_001` to `SampleID_200`), corresponding to specific sweetpotato samples. Image Resolution: RGB and Depth maps are 1280 × 720 pixels. Temporal Consistency: The original frame indices from the raw recording were preserved to allow for multi-view fusion and tracking research. Dataset Summary Total Samples: 200 "Beauregard" sweetpotatoes. Imaging System: Custom LiDAR-based roller conveyor (Intel RealSense L515). Reference Method: Standard water displacement method (average of two replicates). Storage Space: Approximately 15.5 GB (uncompressed). Camera & Imaging SpecificationsAs requested by the system configuration, the Intel RealSense L515 LiDAR was operated with the following settings to ensure data consistency: | Parameter | Configuration / Value ||-----------------------|-----------------------------------------------------------|| **Sensor Model** | Intel RealSense™ L515 LiDAR || **RGB Resolution** | 1280 × 720 pixels (.png) || **Depth Resolution** | 1280 × 720 pixels (.ply) || **Frame Rate** | 30 FPS || **Laser Wavelength** | 860 nm || **Mounting Height** | 0.43 m above the conveyor || **Imaging Lighting** | Ambient indoor light (no controlled lighting) || **Conveyor Speed** | 10 mm/s|| **Exposure Time** | 1250 $\mu s$|| **Gain** | 10|| **Brightness** | 1|| **Contrast** | 50|| **Backlight Compensation** | 98|| **Saturation** | 50|| **Sharpness** | 80|| **White Balance** | 4600 K| File Naming ConventionThe naming convention ensures traceability and temporal alignment:`SampleID_[ID]_FrameID_[ID].[ext]`1. **SampleID**: Unique identifier for each physical sweetpotato root.2. **FrameID**: Sequential order of the frame extracted from the continuous recording. Example: `SampleID_001_FrameID_015.png` is the 15th frame of the 1st sample. Data Processing & Feature Extraction Segmentation: Binary masks were generated using HSV thresholding ($T_{min}=[7,40,120]$, $T_{max}=[20,165,219]$), flood-filling, and morphological opening. Denoising: Statistical Outlier Removal (SOR) was applied using 50 nearest neighbors and a 0.02 standard deviation ratio. Potential Usage: The high-density point clouds support the extraction of 2D features (area, perimeter, radial distance) and 3D features (projected volume, surface area) as described in the related research. Demo A minimal workflow is provided for readers to validate the data; see the `Demo` folder for details. In the (unzipped) demo folder, the spreadsheet "Ft.csv" contains a feature set for 200 samples, each comprising 6 extracted frames and 8 features per RGB-D frame, that is, a 1200 x 8 feature matrix, while the spreadsheet "y.csv" contains the corresponding volume ground-truth. The feature and ground-truth data are also saved in "Ft.mat" and "y.mat" Matlab files, respectively, in the "Demo" folder. To demonstrate the volume prediction, features for samples 1-134 were used for model development and samples 135-200 for testing, as described in Xu et al. (2024). The Python script `data_loader.py` is included for Synchronized 2D/3D dataset traversing, and the "SW_VOL_PRED_MLR_PCA.py" script is used for volume modeling and prediction using simple methods, i.e., multiple linear regression (MLR) and principal component regression (principal component analysis + MLR). The models achieved accuracy comparable to that reported in the related paper (Xu et al., 2024). The model results are reported in Zhang et al. (2026b). CitationsIf you use this dataset in your research, please cite the following: **Related Research Article:**Xu, J., Lu, Y., Olaniyi, E., & Harvey, L. (2024). Online volume measurement of sweetpotatoes by a LiDAR-based machine vision system. *Journal of Food Engineering*, 361, 111725. https://doi.org/10.1016/j.jfoodeng.2023.111725 **Dataset:**Zhang, J., Lu, Y., Kong, Z., & Xu, J. (2026a). A LiDAR-based Machine Vision Dataset for Online Volume Measurement of Sweetpotatoes [Data set]. *Zenodo*. https://doi.org/10.5281/zenodo.18378019. Zhang, J., Lu, Y., Kong, Z., & Xu, J. (2026). A LiDAR-based Machine Vision Dataset for Online Volume Measurement of Sweetpotatoes. Data in Brief (pending). ---*For questions regarding the data collection or system configuration, please contact luyuzhen@msu.edu.*

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average