
Purpose: In order to use a popular supervised learning algorithm such as BERT to extract the relationships of concepts (triple relationship extraction), it is necessary to label the relationship types manually. If some relation words are not been labeled in the training stag, they cannot be recognized probably in the test stage and the corresponding entities cannot been recognized accordingly. This paper proposes a new unsupervised algorithm to extract as many relation words as possible of two entities, especially those that are easily overlooked. Methods: The disease-cause relationship was taken as an example, and 10204 effective sentences of disease and corresponding causes were extracted by web crawler. According to the constraints of syntactic, semantic and lexical features, the relationship words were extracted with an unsupervised manner, and the automatic extracted results were summarized. Results: Some specific relation words that are ignored in manual labeling stage are found; the conjoining relation words often appeared together in the texts are recognized; some types and features of relation words are obtained. These types and features can be used to help the relation labeling in the supervised learning stage, and to help expanding the relevant knowledge graphs and improving the accuracy of information retrieval.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
