Downloads provided by UsageCounts
Readily available, trustworthy, and usable medical information is vital to promoting global health. Cochrane is a non-profit medical organization that conducts and publishes systematic reviews of medical research findings. Over 3000 Cochrane Reviews are presently used as evidence in Wikipedia articles. Currently, Cochrane’s researchers manually search Wikipedia pages related to medicine in order to identify Wikipedia articles that can be improved with Cochrane evidence. Our aim is to streamline this process by applying existing document similarity and information retrieval methods to automatically link Wikipedia articles and Cochrane Reviews. Potential challenges to this project include document length and the specificity of the corpora. These challenges distinguish this problem from ordinary document representation and retrieval problems. For our methodology, we worked with data from 7400 Cochrane Reviews, ranging from one to several pages in length, and 33,000 Wikipedia articles categorized as medical. We explored different methods of document vectorization including TFIDF, LDA, LSA, word2Vec, and doc2Vec. For every document in both corpora, their similarity to each document in the opposing set was calculated using established vector similarity metrics such as cosine similarity and KLdivergence. Labeled data for this unsupervised task was not available. Models were evaluated by comparing the results to two standards: (1) Cochrane Reviews currently cited in Wikipedia articles and (2) a data set provided by a medical expert that indicates which Cochrane Reviews could be considered for specific Wikipedia articles. Our system performs best using TFIDF document representation and cosine similarity.
Wikipedia, machine learning, Cochrane Reviews, tf–idf
Wikipedia, machine learning, Cochrane Reviews, tf–idf
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 3 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
| views | 7 | |
| downloads | 8 |

Views provided by UsageCounts
Downloads provided by UsageCounts