
handle: 10045/86730
Text summarization is the task of condensing a document keeping the relevant information. This task integrated in wider information systems can help users to access key information without having to read everything, allowing for a higher efficiency. In this research work, we have developed and evaluated a single-document extractive summarization approach, named SemPCA-Summarizer, which reduces the dimension of a document using Principal Component Analysis technique enriched with semantic information. A concept-sentence matrix is built from the textual input document, and then, PCA is used to identify and rank the relevant concepts, which are used for selecting the most important sentences through different heuristics, thus leading to various types of summaries. The results obtained show that the generated summaries are very competitive, both from a quantitative and a qualitative viewpoint, thus indicating that our proposed approach is appropriate for briefly providing key information, and thus helping to cope with a huge amount of information available in a quicker and efficient manner.
This research work has been partially funded by the Generalitat Valenciana and the Spanish Government through the projects PROMETEOII/2014/001, TIN2015-65100-R, and TIN2015-65136-C2-2-R.
Natural language processing, Automatic text summarization, Lenguajes y Sistemas Informáticos, Intelligent information processing, Principal component analysis, Natural language processing, human language technologies, intelligent information processing, automatic text summarization, principal component analysis, 68-T50, Human language technologies, other areas of Computing and Informatics; Natural Language Processing; Text Summarization
Natural language processing, Automatic text summarization, Lenguajes y Sistemas Informáticos, Intelligent information processing, Principal component analysis, Natural language processing, human language technologies, intelligent information processing, automatic text summarization, principal component analysis, 68-T50, Human language technologies, other areas of Computing and Informatics; Natural Language Processing; Text Summarization
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 4 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
