Document similarity analysis in Slovak language

Vladimir Hanusniak; Vladimir Smatanik; Milan Straka; Michal Zabovsky

Found an issue? Give us feedback

https://doi.org/10.1...arrow_drop_down

https://doi.org/10.1109/icimte...

Article . 2016 . Peer-reviewed

Data sources: Crossref

https://dx.doi.org/10.1109/ici...

Article

Data sources: Microsoft Academic Graph

Document similarity analysis in Slovak language

descriptionPublicationkeyboard_double_arrow_right Article 01 Nov 2016Publisher:IEEEJournal:2016 International Conference on Information Management and Technology (ICIMTech)

Authors: Vladimir Hanusniak; Vladimir Smatanik; Milan Straka; Michal Zabovsky;

doi: 10.1109/icimtech.2016.7930345

Document similarity analysis in Slovak language

- Summary
- Metrics

Abstract

Examining data for similar items is one of the fundamental data-mining problems. Application of methods for similarity search could be useful for plagiarism or near-duplicate web page detection. The computerized methods developed during last years are mainly focused on English language. However, Slovak language has several specific attributes and using these methods may not be precise enough. Our objective of this research paper is to develop a Proof-of-Concept for document similarity estimation process devoted to Slovak Language. The complexity of Slovak language gives us an opportunity to analyze and adjust the methods parameters and thus achieve higher accuracy. Text mining process suggested in this article utilizes stop words list and shingles to accurately measure documents similarity. Results are further validated using date constraint.

Related Organizations

University of Žilina
Slovakia

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	1
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

1

Average

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Upload OA version

Are you the author of this publication? Upload your Open Access version to Zenodo!

It’s fast and easy, just two clicks!

uploadUpload now