
handle: 11250/2413906
Information retrieval technique assists us to extract information from a huge amount of information sources. Web search engine is a typically commercial system implementing information retrieval technique and receiving increasing popularity with larger amount of searching demands nowadays. Users’ requirements on web search could be quite various. They may search for entities like music, people, locations, products, etc, or verticals like “shopping”, “news”, “images”, etc. All these entities or verticals could be placed in multiple documents and possibly in additional sources. As a result, when information retrieval is searching for objects associated with multiple documents, we need to “fuse” information from multiple documents. Normally, there are two ways to fuse documents, one strategy is “early” fusion, where a term-based representation is built for each object (e.g., entity or vertical). The other strategy is “late” fusion, where firstly relevant documents are retrieved, then their scores are combined. In this project, two general fusion strategies, which are objectcentric model and document-centric model respectively, will be introduced and implemented across federated search and expert search. Federated search is a search task for searching multiple text collections simultaneously. Queries are submitted to a subset of collections that are most likely to return relevant answers. Fusion-based methods are used for ranking these collections by similarity between query and collection. Expert search is a task for locating expertise with the associated documents, topics, etc. An expert’s knowledge can be modeled based on the associated documents, or modeling topics enables to find the documents. In this project, the literature on federated search, expert search and blog distillation tasks and their experiment data sets will be introduced, of which the last one is for further experiment. To evaluate the performance of two fusion-based methods in different tasks, comparison and analysis are carried out both between fusion methods and probability estimation methods. The effectiveness and efficiency of search results are the most concerned evaluation factors. Finally, conclusion is drawn based on the performances of object-centric and document-centric models.
Master's thesis in Computer science
VDP::Technology: 500::Information and communication technology: 550::Computer technology: 551, datateknikk, informasjonsteknologi, information retrieval, informasjonsgjenfinning
VDP::Technology: 500::Information and communication technology: 550::Computer technology: 551, datateknikk, informasjonsteknologi, information retrieval, informasjonsgjenfinning
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
