
handle: 10919/96310
We are building an Information and Retrieval System that will work as a search engine to support searching, ranking, browsing, and recommendations for two large collections of data. The first collection is part of Virginia Tech's collection of Electronic Theses and Dissertations (ETDs). The Virginia Tech Library has a large collection of ETDs. Currently, there is an effort being made to digitize the pre-1997 theses and dissertations and load them into VTechWorks. Our data set contains over 30K ETDs. The second collection is of tobacco settlement documents. There are 14 million documents in this data set. We are using a CEPH container to store and retrieve information. To achieve its goals, the project has six teams: Collection Management ETDs, Collection Management Tobacco Settlement Documents, Elasticsearch, Front-end and Kibana, Integration and Implementation, and Text Analytics and Machine Learning. This report addresses the work performed by the Elasticsearch team. The Elasticsearch team helps to enable searching and browsing, which are supported based on: facets associated with information extracted from documents, analysis, classification, clustering, summarization, and other processing. The report describes goals, overview, and the process of implementation with Elasticsearch. The Elasticsearch team works closely with the Kibana and Text Machine Learning groups. The data ingested in Elasticsearch is provided to the Front End team for further visualization. Thus, the report also describes the connections established with the other groups, as a high-level overview of the course project. The user manuals have been provided for the reference of other groups.
ELSFinalReport.pdf - PDF file of the final report ELSFinalReport.zip - LaTeX source of the final report ELSPresentation.pdf - PDF file of the final presentation ELSPresentation.pptx - Editable file of the final presentation ELSSourceCode.zip - Package of all the Python scripts, HTTP Queries, Shell scripts associated with this project
IMLS LG-37-19-0078-19
Elasticsearch, Information Retrieval
Elasticsearch, Information Retrieval
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
