Enhancing Industrial Data Analysis through Machine Learning-based Classification of Petrochemical Datasets

Name: Enhancing Industrial Data Analysis through Machine Learning-based Classification of Petrochemical Datasets
Keywords: Machine Learning, Analytics, Alkylation Process, Data Classification, Industry 4.0

Fáber, Rastislav; L'ubušký, Karol; Mojto, Martin; Paulen, Radoslav

Found an issue? Give us feedback

ZENODOarrow_drop_down

ZENODO

Conference object . 2023

License: CC BY

Data sources: Datacite

ZENODO

Conference object . 2023

License: CC BY

Data sources: Datacite

http://dx.doi.org/10.5281/zeno...

Other literature type . 2023

Data sources: European Union Open Data Portal

Enhancing Industrial Data Analysis through Machine Learning-based Classification of Petrochemical Datasets

descriptionPublicationkeyboard_double_arrow_right Conference object , Other literature type 01 Jan 2023 English Publisher:ZenodoJournal:49th International Conference of the Slovak Society of Chemical Engineering SSCHE 2023Funded by:EC | FrontSeat

Authors: Fáber, Rastislav; L'ubušký, Karol; Mojto, Martin; Paulen, Radoslav;

doi: 10.5281/zenodo.8284096 , 10.5281/zenodo.8284097

Enhancing Industrial Data Analysis through Machine Learning-based Classification of Petrochemical Datasets

- Summary
- Subjects
- Metrics

Abstract

Incorporating data analytics and machine learning (ML) algorithms into industrial decision making has proven to be a promising way to boost production efficiency. By utilizing ML algorithms to classify historical measurements from online sensors and laboratory analyses, it is possible to provide an operation guideline that was previously unavailable. We apply rigorous data treatment to prepare the raw data for ML-based classifier design. This process includes data cleaning, data standardization, data averaging, variable removal (based on linear dependency analysis), and distant outlier detection; to ensure the quality and reliability of available data. Selection of a suitable classifier model depends on the complexity of an industrial process, the level of its automation (implementation effort) and the ability to handle data outliers. We employ Density-Based Spatial Clustering of Applications with Noise (DBSCAN) for initial ground-truth labeling, after which we utilize well understood ML algorithms; k-Means, k-Nearest Neighbors (k-NN), Support Vector Machine (SVM) and SVM with time difference, to engineer a framework for real-time classification. Accurate categorization of measurements is crucial for identifying slight deviations from real values that could impact the quality of the final product. Moreover, the complexity of the data plays a significant role in the performance of ML algorithms. With precise categorization of real-time data, the need for human intervention in process control can be minimized. To evaluate the performance of the designed classifiers, we compare their classification accuracy against the aforementioned synthetic ground truth labels. This comparison is carried out on a testing dataset that was not used during the framework design. Overall, our results demonstrate that the ML-based classifiers achieve comparable results in real-time classification. The most accurate classifier was the SVM model which uses not only absolute data, but also their time differences, which achieved the highest anomaly detection, 82 %.

Related Organizations

Slovak University of Technology Bratislava
Slovakia
Slovak University of Technology in Bratislava
Slovakia

Keywords

Machine Learning, Analytics, Alkylation Process, Data Classification, Industry 4.0

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average