
Pre-trained Language Models have recently emerged in Information Retrieval as providing the backbone of a new generation of neural systems that outperform traditional methods on a variety of tasks. However, it is still unclear to what extent such approaches generalize in zero-shot conditions. The recent BEIR benchmark provides partial answers to this question by comparing models on datasets and tasks that differ from the training conditions. We aim to address the same question by comparing models under more explicit distribution shifts. To this end, we build three query-based distribution shifResearch goal: How does the performance of COCO-DR compare to other zero-shot dense retrieval methods such as BEIR or ANCE on out-of-domain benchmarks like MS MARCO and NQ?Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 8.7/10.
