
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>pmid: 29227671
We have developed a simple text mining algorithm that allows us to identify surface area and pore volumes of metal-organic frameworks (MOFs) using manuscript html files as inputs. The algorithm searches for common units (e.g., m2/g, cm3/g) associated with these two quantities to facilitate the search. From the sample set data of over 200 MOFs, the algorithm managed to identify 90% and 88.8% of the correct surface area and pore volume values. Further application to a test set of randomly chosen MOF html files yielded 73.2% and 85.1% accuracies for the two respective quantities. Most of the errors stem from unorthodox sentence structures that made it difficult to identify the correct data as well as bolded notations of MOFs (e.g., 1a) that made it difficult identify its real name. These types of tools will become useful when it comes to discovering structure-property relationships among MOFs as well as collecting a large set of data for references.
Surface Properties, Data Mining, Porosity, Metal-Organic Frameworks, Natural Language Processing
Surface Properties, Data Mining, Porosity, Metal-Organic Frameworks, Natural Language Processing
| citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 54 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
