Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ https://espace.libra...arrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
https://doi.org/10.14264/uql.2...
Doctoral thesis . 2015 . Peer-reviewed
Data sources: Crossref
versions View all 1 versions
addClaim

Large scale material science data analysis

Authors: Belisle, Eve;

Large scale material science data analysis

Abstract

Material Science, the science of studying materials and their properties, involves many aspects such as performing experiments to calculate certain physical properties. Scientists are always looking to utilise the collected experimental data in order to make predictions for new points, where the studied property is unknown. Using a computer model to make these predictions, whether it is via a machine learning or mathematical approach, is the desirable option, since doing actual experiments have proven to be very costly and time consuming. We are therefore looking at utilising the vast quantity of pre-collected data in the literature in order to build models for making future predictions. We already know that the Gaussian process regression interpolation technique gives accurate predictions for some physical properties. However, it is also the slowest of the machine learning algorithms and not suitable for on-line applications. For on-line learning, making quick and accurate predictions is essential. In this research we propose a novel strategy, including batch query processing and co-clustering, to achieve a scalable and efficient Gaussian process regression. This new approach, called the scalable Gaussian process (SGP), allows the use of large databases and makes it suitable for on-line applications. The proposed strategy is applied to a real application involving the prediction of materials properties. Results demonstrate the high accuracy and efficiency of our approach. We test and compare SGP with five different machine learning models on material properties databases and make recommendations accordingly, also demonstrating that prior knowledge of the problem is essential when choosing a machine learning model. As one could expect, databases consisting of experimental data are noisy since they rely on human measurements, and also because they are an amalgamation of various independent sources (research papers). Therefore, some conflicting information can be found between the various sources. In our research we also introduce a novel truth discovery approach to reduce the amount of noise and filter the incorrect conflicting information hidden in scientific databases. Our method ranks the multiple data sources by considering the relationships between them, i.e., the amount of conflicting information and the amount of agreement, and as well eliminates the conflicting information. Our previously introduced technique, SGP, is then applied to the clean dataset to make predictions. We compare the prediction accuracy before and after pruning the databases. With our new approach, we are able to highly improve the accuracy of SGP predictions and provide a more reliable database. Our results also prove the extreme robustness of SGP, as we demonstrate that a relatively high amount of noise is handled very well by this technique.

Country
Australia
Related Organizations
Keywords

Truth discovery, 0806 Information Systems, Machine learning, Scientific databases, Gaussian Process Regression, Data mining

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Related to Research communities