
You have already added 0 works in your ORCID record related to the merged Research product.
You have already added 0 works in your ORCID record related to the merged Research product.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>
You have already added 0 works in your ORCID record related to the merged Research product.
You have already added 0 works in your ORCID record related to the merged Research product.
Impact of Textual Data Augmentation on Linguistic Pattern Extraction to Improve the Idiomaticity of Extractive Summaries
International audience; The present work aims to develop a text summarisation system for financial texts with a focus on the fluidity of the target language. Linguistic analysis shows that the process of writing summaries should take into account not only terminological and collocational extraction, but also a range of linguistic material referred to here as the "support lexicon", that plays an important role in the cognitive organisation of the field. On this basis, this paper highlights the relevance of pre-training the CamemBERT model on a French financial dataset to extend its domainspecific vocabulary and fine-tuning it on extractive summarisation. We then evaluate the impact of textual data augmentation, improving the performance of our extractive text summarisation model by up to 6%-11%.
- Université Paris Diderot France
- University of Burgundy France
Microsoft Academic Graph classification: Vocabulary Computer science Process (engineering) media_common.quotation_subject Lexicon Linguistics Field (computer science) Focus (linguistics) Terminology Corpus linguistics Relevance (information retrieval) media_common
Linguistic Patterns, Deep learning, Terminology, [SHS.LANGUE] Humanities and Social Sciences/Linguistics, Text summarisation, Corpus Linguistics, [SHS.LANGUE]Humanities and Social Sciences/Linguistics, Natural Language Processing
Linguistic Patterns, Deep learning, Terminology, [SHS.LANGUE] Humanities and Social Sciences/Linguistics, Text summarisation, Corpus Linguistics, [SHS.LANGUE]Humanities and Social Sciences/Linguistics, Natural Language Processing
Microsoft Academic Graph classification: Vocabulary Computer science Process (engineering) media_common.quotation_subject Lexicon Linguistics Field (computer science) Focus (linguistics) Terminology Corpus linguistics Relevance (information retrieval) media_common
citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).1 popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.Average influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).Average impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.Average citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).1 popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.Average influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).Average impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.Average Powered byBIP!

International audience; The present work aims to develop a text summarisation system for financial texts with a focus on the fluidity of the target language. Linguistic analysis shows that the process of writing summaries should take into account not only terminological and collocational extraction, but also a range of linguistic material referred to here as the "support lexicon", that plays an important role in the cognitive organisation of the field. On this basis, this paper highlights the relevance of pre-training the CamemBERT model on a French financial dataset to extend its domainspecific vocabulary and fine-tuning it on extractive summarisation. We then evaluate the impact of textual data augmentation, improving the performance of our extractive text summarisation model by up to 6%-11%.