Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ Geomechanics and Geo...arrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
Geomechanics and Geoengineering
Article . 2025 . Peer-reviewed
License: CC BY NC ND
Data sources: Crossref
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
https://dx.doi.org/10.48550/ar...
Article . 2025
License: CC BY
Data sources: Datacite
versions View all 3 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

Machine learning approaches for automatic cleaning of investigative drilling data

Authors: Huang, Fei; Qin, Hongyu; Manafi, Masoud; Juett, Ben; Evans, Ben;

Machine learning approaches for automatic cleaning of investigative drilling data

Abstract

Investigative drilling (ID) is an innovative measurement while drilling (MWD) technique that has been implemented in various site investigation projects across Australia. While the automated drilling feature of ID substantially reduces noise within drilling data streams, data cleaning remains essential for removing anomalies to enable accurate strata classification and prediction of soil and rock properties. This study employed three machine learning algorithms--IsoForest, one-class SVM, and DBSCAN--to automate the data cleaning process for ID data in rock drilling scenarios. Two data cleaning contexts were examined: (1) removing anomalies in rock drilling data, and (2) removing both anomalies and soil drilling data in mixed rock drilling data. The analysis revealed that all three machine learning algorithms outperformed traditional statistical methods (the 3-sigma rule and IQR method) in both data cleaning tasks, achieving a good balance between true positive rate and false positive rate, though hyperparameter tuning was required for one-class SVM and DBSCAN. Among them, IsoForest was proven to be the best-performing algorithm, capable of removing anomalies effectively without the need for hyperparameter adjustment. Furthermore, IsoForest, combined with two-cluster K-means, successfully eliminated both soil drilling data and anomalies while preserving almost all the normal data. The automatic data cleaning strategy proposed in this paper has the potential to reduce laborious manual data cleaning efforts and thereby facilitate the development of large-scale, high-quality datasets for machine learning studies capable of revealing complex relationships between drilling data and rock properties.

20 pages, 17 figures, 4 tables

Related Organizations
Keywords

Geophysics, Data Analysis, Statistics and Probability, FOS: Physical sciences, Data Analysis, Statistics and Probability (physics.data-an), Geophysics (physics.geo-ph)

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Green
hybrid