An Extension of the VSM Documents Representation

descriptionPublicationkeyboard_double_arrow_right Article 23 Apr 2017Publisher:Agora University of OradeaJournal:International Journal of Computers Communications & Control, volume 12, page 402 (issn: 1841-9836, eissn: 1841-9836,

Copyright policy )

Authors: Lucian N. Vintan; Daniel Morariu; Radu George Cretulescu; Maria N. Vintan;

doi: 10.15837/ijccc.2017.3.2889

An Extension of the VSM Documents Representation

- Summary
- Metrics

Abstract

In this paper we will present a new approach regarding the documents representation in order to be used in classification and/or clustering algorithms. In our new representation we will start from the classical "bag-of-words" representation but we will augment each word with its correspondent part-of-speech. Thus we will introduce a new concept called hyper-vectors where each document is represented in a hyper-space where each dimension is a different part-of-speech component. For each dimension the document is represented using the Vector Space Model (VSM). In this work we will use only five different parts of speech: noun, verb, adverb, adjective and others. In the hyper-space each dimension has a different weight. To compute the similarity between two documents we have developed a new hyper-cosine formula. Some interesting classification experiments are presented as validation cases.

Related Organizations

"Lucian Blaga" University of Sibiu
Romania

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

gold

Fields of Science (3) View all

social sciences

economics and business

Fields of Science

social sciences

economics and business

View all

Related to Research communities

FORTHEM Alliance