
In order to provide a knowledge source for the innovative design of the patent-based computer-aided products, a patent topical web crawler was designed and developed targeting at the patent information of the US Patent and Trademark Office (USPTO). In this paper, we describe the overall design and workflow of the patent topical crawler, including the basic functional architecture and key system technologies; propose the patent short text similarity calculation method based on Doc2Vec for the relevance discrimination of patent topic, which can effectively screen the required patent data. The experiment result shows that, this patent topical web crawler has high acquisition efficiency and applicability.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
