Generating artificial texts for query expansion

Name: Generating artificial texts for query expansion
Creator: Claveau, Vincent
Keywords: [INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI], GPT2, Text generation, modèle de langue génératif, document retrieval, recherche de documents, [INFO.INFO-CL] Computer Science [cs]/Computation and Language [cs.CL], query expansion, expansion de requête, augmentation de données

Claveau, Vincent

Found an issue? Give us feedback

INRIA2arrow_drop_down

INRIA2

Conference object . 2021

Data sources: INRIA2

HAL-Rennes 1

Conference object . 2021

Data sources: HAL-Rennes 1

INRIA a CCSD electronic archive server

Conference object . 2021

Data sources: INRIA a CCSD electronic archive server

Generating artificial texts for query expansion

descriptionPublicationkeyboard_double_arrow_right Conference object 01 Jan 2021 France French

Authors: Claveau, Vincent;

Generating artificial texts for query expansion

- Summary
- Subjects
- Related research
  (2)
- Metrics

Abstract

Un moyen d'améliorer les performances de la recherche de documents consiste à étendre la requête de l'utilisateur. Plusieurs approches ont été proposées dans la littérature, et certaines d'entre elles obtiennent des résultats très compétitifs. Dans cet article, nous explorons l'utilisation de la génération de texte pour étendre automatiquement les requêtes. Nous nous appuyons sur un modèle génératif neuronal bien connu, GPT-2, qui est fourni avec des modèles pré-entraînés pour l'anglais mais qui peut également être affiné sur des corpus spécifiques. À travers différentes expériences, nous montrons que la génération de texte est un moyen très efficace d'améliorer les performances d'un système de RI, avec une marge importante (+10% de gains MAP), et qu'il surpasse des approches état-de-l'art reposant également sur l'expansion des requêtes (LM+RM3). Cette approche conceptuellement simple peut être facilement mise en oeuvre sur n'importe quel système de RI grâce à la disponibilité du code et des modèles GPT.

A well-known way to improve the performance of document retrieval is to expand the user's query. Several approaches have been proposed in the literature, and some of them are considered as yielding state-of-the-art results. In this paper, we explore the use of text generation to automatically expand the queries. We rely on a well-known neural generative model, GPT-2, that comes with pre-trained models for English but can also be fine-tuned on specific corpora. Through different experiments, we show that text generation is a very effective way to improve the performance of an IR system, with a large margin (+10% MAP gains), and that it outperforms strong baselines also relying on query expansion (LM+RM3). This conceptually simple approach can easily be implemented on any IR system thanks to the availability of GPT code and models.

Country

France

Related Organizations

University of Rennes 1
France
French National Centre for Scientific Research
France
Institut de Recherche en Informatique et Systèmes Aléatoires
France
Université de Rennes 1
France
University of Southern Brittany
France

View all View all

Keywords

[INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI], GPT2, Text generation, modèle de langue génératif, document retrieval, recherche de documents, [INFO.INFO-CL] Computer Science [cs]/Computation and Language [cs.CL], query expansion, expansion de requête, augmentation de données, [INFO.INFO-IR] Computer Science [cs]/Information Retrieval [cs.IR], generative language model, Génération de textes, data augmentation

2 Research products, page 1 of 1

gpt-2 software on GitHub
IsRelatedTo
gpt-3 software on GitHub
IsRelatedTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green

Related to Research communities

INRIA

Generating artificial texts for query expansion

Generating artificial texts for query expansion

2 Research products, page 1 of 1

gpt-2 software on GitHub

gpt-3 software on GitHub