
handle: 20.500.12556/DKUM-78896
As part of the master's thesis, we focused on the issue of finding suitable journals for the publication of scientific articles by various authors. In the first part, we focused on acquiring knowledge from unstructured data. We used the word embedding method to gain useful knowledge. In the second part, we focused on building a software solution for vectorization of scientific articles and journals. The purpose of the master's thesis was to determine whether we can use machine learning and text vectorization techniques to determine the similarities between scientific articles of different authors and journals and thus determine whether the author publishes his scientific articles in the correct journals. The input corpus was obtained from the online database of scientific articles Scoupus. With the help of the results of the software solution, we performed an analysis with the help of which we obtained answers to the posed research questions and consequently accepted or rejected the set hypotheses.
V sklopu magistrske naloge smo se osredotočili na problematiko iskanja primernih revij za objavo znanstvenih člankov različnih avtorjev. V prvem delu smo se osredotočili na pridobivanje znanja iz nestrukturiranih podatkov. Za pridobivanje uporabnega znanja smo uporabili način besedne vložitve. V drugem delu smo se osredotočili na izgradnjo programske rešitve za vektorizacijo znanstvenih člankov in revij. Namen magistrske je bil ugotoviti, ali lahko s pomočjo strojnega učenja in tehnike vektorizacije besedila ugotovimo podobnosti med znanstvenimi članki različnih avtorjev in revij ter na takšen način ugotovimo, ali avtor objavlja svoje znanstvene članke v pravilnih revijah. Vhodni korpus smo pridobili iz spletne baze znanstvenih člankov Scopus. S pomočjo rezultatov programske rešitve smo opravili analizo, s pomočjo katere smo pridobili odgovore na zastavljena raziskovalna vprašanja ter posledično sprejeli ali zavrgli hipoteze.
besedne vložitve, obdelava naravnega jezika, tf-idf, doc2vec, Word embedding, text vectorization, info:eu-repo/classification/udc/004.85:004.775(043.2), vektorizacija besedila, natural language processing
besedne vložitve, obdelava naravnega jezika, tf-idf, doc2vec, Word embedding, text vectorization, info:eu-repo/classification/udc/004.85:004.775(043.2), vektorizacija besedila, natural language processing
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
