
The Biodiversity Heritage Library (BHL) has over 63 million pages of content, which presents a major challenge: how do we discover content in these millions of pages? To date, discovery has relied on indexing BHL text for taxonomic names (e.g., Global Names) and segmenting scanned volumes into articles (e.g., BioStor). Name indexing enables finding pages that mention a taxonomic name. Segmenting BHL makes it easier to find relevant articles within journals. But these searches give little clue as to the relative importance of the (sometimes thousands) of results. One approach to ranking results is to count the number of incoming links to individual BHL pages, akin to Google’s PageRank method. The main source of these links are likely to be from Wikipedia, Wikispecies, and Wikidata. Whereas most work on Wikimedia and BHL has focused on getting BHL content into these wikis (with the notable exception of BHL adopting Wikidata Q numbers for authors), here I flip that relationship and count the number of links coming from the wikis to individual BHL pages. These links can be supplemented by links from taxonomic databases, such as those being aggregated by the Catalogue of Life. The number of these links could be used to rank the importance of a search result (the more links the more likely that the page is relevant). This talk will discuss this approach and show examples based on the “BHL-Light” test bed for exploring new interfaces to BHL.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
