Modelling Sentence Generation from Sum of Word Embedding Vectors as a Mixed Integer Programming Problem

descriptionPublicationkeyboard_double_arrow_right Article , Conference object 01 Dec 2016Publisher:IEEEJournal:2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW)

Authors: Lyndon White; Roberto Togneri; Wei Liu 0006; Mohammed Bennamoun;

doi: 10.1109/icdmw.2016.0113

Modelling Sentence Generation from Sum of Word Embedding Vectors as a Mixed Integer Programming Problem

- Summary
- Metrics

Abstract

Converting a sentence to a meaningful vector representation has uses in many NLP tasks, however very few methods allow that representation to be restored to a human readable sentence. Being able to generate sentences from the vector representations demonstrates the level of information maintained by the embedding representation – in this case a simple sum of word embeddings. We introduce such a method for moving from this vector representation back to the original sentences. This is done using a two stage process, first a greedy algorithm is utilised to convert the vector to a bag of words, and second a simple probabilistic language model is used to order the words to get back the sentence. To the best of our knowledge this is the first work to demonstrate quantitatively the ability to reproduce text from a large corpus based directly on its sentence embeddings.

Related Organizations

University of Western Australia
Australia

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	2
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

2

Average

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Upload OA version

Are you the author of this publication? Upload your Open Access version to Zenodo!

It’s fast and easy, just two clicks!

uploadUpload now