Data for a publication - Bearing Fault Datasets

This dataset consists of acceleration signals obtained either from a simplified virtual prototype of the CWRU test rig modelled in the multibody simulation software Adams, or from measured signals published online by Case Western Reserve University (CWRU). Only a subset of the CWRU measurements was selected to form the experimental dataset, as specified in [1]. The signals were pre-processed by removing the linear trend, extracting the signal envelope, resampling to 6 kHz and per-signal normalising to [0, 1] interval. Next, the signals were segmented into 0.1 s windows (600-sample vectors). Variable overlap was used to mitigate class imbalance. The resulting segments were batched in groups of 8. The dataset is already partitioned into training, validation, and test subsets. The datasets were prepared using the TensorFlow framework and are provided as instances of the tf.data.Dataset class. The tensors use the variable format dtype=tf.float64, users may convert it to tf.float32 to improve computational performance. The dataset consists of two parts: a simulated part and an experimental part. The simulated part is structured for the domain adaptation network described in [1] and contains two types of labels: class labels and domain labels. Each data segment is associated with a 5-element label vector. The first three elements are one-hot encoded class labels in the order {"Healthy", "IR", "OR"}, and the last two elements are one-hot encoded domain labels in the order {"Simulation", "Experiment"}. Since all segments in this subset belong to the simulation (source) domain, the domain label is always [1, 0]. The experimental part consists of a labelled dataset and an unlabelled dataset. Both contain the same data segments but differ in the class-label portion of the label vectors. As in the simulated part, each segment is associated with a 5-element label vector, where the first three elements correspond to the class labels {"Healthy", "IR", "OR"} and the last two correspond to the domain labels {"Simulation", "Experiment"}. In the unlabelled dataset, the first three elements are set to zero, meaning that the class labels are masked. Since all segments in this subset belong to the experimental (target) domain, the domain label is always [0, 1]. The labelled dataset has the same structure, but the class labels are retained. The methodology used to build the multibody simulation model, the exact selection of CWRU measurements, and the procedure used to split the CWRU data into training, validation, and test sets are described in the doctoral thesis [1]. Researchers using the CWRU dataset are encouraged to consult the benchmark analysis in [2], which discusses important characteristics and limitations of the dataset. In particular, care should be taken to avoid data leakage caused by incorrect partitioning, as discussed in [1] which also cites additional studies addressing this issue. References [1] J. Rekem, “Data-driven fault diagnosis method for gearbox components,” Brno University of Technology, Faculty of Mechanical Engineering, Brno, 2025. Accessed: Mar. 25, 2026. [Online]. Available: https://www.vut.cz/en/students/final-thesis/detail/172180 [2] W. A. Smith and R. B. Randall, “Rolling element bearing diagnostics using the Case Western Reserve University data: A benchmark study,” Dec. 01, 2015, Academic Press. doi: 10.1016/j.ymssp.2015.04.021.

Related Organizations

Brno University of Technology
Czech Republic

Keywords

domain adaptation, Bearing fault diagnosis, deep learning, envelope analysis, fault detection

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average