Large scale material science data analysis

descriptionPublicationkeyboard_double_arrow_right Doctoral thesis , Other literature type , Thesis 02 Jul 2015 Australia Publisher:University of Queensland Library

Authors: Belisle, Eve;

doi: 10.14264/uql.2015.673

Large scale material science data analysis

- Summary
- Subjects
- Metrics

Abstract

Material Science, the science of studying materials and their properties, involves many aspects such as performing experiments to calculate certain physical properties. Scientists are always looking to utilise the collected experimental data in order to make predictions for new points, where the studied property is unknown. Using a computer model to make these predictions, whether it is via a machine learning or mathematical approach, is the desirable option, since doing actual experiments have proven to be very costly and time consuming. We are therefore looking at utilising the vast quantity of pre-collected data in the literature in order to build models for making future predictions. We already know that the Gaussian process regression interpolation technique gives accurate predictions for some physical properties. However, it is also the slowest of the machine learning algorithms and not suitable for on-line applications. For on-line learning, making quick and accurate predictions is essential. In this research we propose a novel strategy, including batch query processing and co-clustering, to achieve a scalable and efficient Gaussian process regression. This new approach, called the scalable Gaussian process (SGP), allows the use of large databases and makes it suitable for on-line applications. The proposed strategy is applied to a real application involving the prediction of materials properties. Results demonstrate the high accuracy and efficiency of our approach. We test and compare SGP with five different machine learning models on material properties databases and make recommendations accordingly, also demonstrating that prior knowledge of the problem is essential when choosing a machine learning model. As one could expect, databases consisting of experimental data are noisy since they rely on human measurements, and also because they are an amalgamation of various independent sources (research papers). Therefore, some conflicting information can be found between the various sources. In our research we also introduce a novel truth discovery approach to reduce the amount of noise and filter the incorrect conflicting information hidden in scientific databases. Our method ranks the multiple data sources by considering the relationships between them, i.e., the amount of conflicting information and the amount of agreement, and as well eliminates the conflicting information. Our previously introduced technique, SGP, is then applied to the clean dataset to make predictions. We compare the prediction accuracy before and after pruning the databases. With our new approach, we are able to highly improve the accuracy of SGP predictions and provide a more reliable database. Our results also prove the extreme robustness of SGP, as we demonstrate that a relatively high amount of noise is handled very well by this technique.

Country

Australia

Related Organizations

University of Queensland
Australia
University of Queensland
Australia

Keywords

Truth discovery, 0806 Information Systems, Machine learning, Scientific databases, Gaussian Process Regression, Data mining

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Beta

SDGs Suggest

2. Zero hunger

Beta

SDGs:

2. Zero hunger,

Related to Research communities

Knowmad Institut