Advanced search in
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
4 Research products

  • Research data
  • Other research products
  • 2014-2023
  • Open Access
  • Estonian
  • Digital Humanities and Cultural Heritage

Relevance
arrow_drop_down
  • image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
    Authors: Arutyunyan, David;

    The aim of this thesis is to investigate Estonian history textbooks published in the Republic of Estonia in Estonian as well as in Russian languages and used in the gymnasium during the period from 1991-2012. The thesis is focused on the following landmarks in the history of Estonia: The World War II, the Stalin's repressions, the time under the Soviet rule, the collapse of the Soviet union and the sovereignty of Estonia stemming from it. It is these periods and landmarks which may originate different theoretical interpretations of history in terms of the framework of Estonian Republic. Estonian as well as Russian textbooks of history for gymnasium can serve as sources and media for interpretations. Topicality and actual value of the thesis today is certain and evident owing to the reasons. Since it is perfectly possible to influence and shape the worldviews and perspectives of a personality especially during one's reaching the age of puberty. History textbooks or as in the case Estonian textbooks on history in particular are the means enabling to create certain viewpoints and perspectives. School plays the role of institution with its curricula realizing the mentioned function. The young among the local Estonians as well as Russian-speaking minorities may have different perspectives and viewpoints. Besides that, the Estonian textbooks used do not necessarily reflect the historical facts the same way, neither give similar understanding of the same landmarks of significance. Given the mentioned assumptions it is important to learn whether the history textbooks published in this particular time lapse in Estonia have possibly influenced in certain way and shaped different understandings of the past events or on the contrary, suggest the same identical worldview and perspective or imply several alternative possible interpretations. There have been made certain conclusions with the help of analysis made on the basis of comparisons and juxtaposing. The conclusions found are that Estonian textbooks can shape and create different viewpoints being in stark contrast with one another. There are differences between the Estonians textbooks written in Russian and Estonian as well as numerous differences between the textbooks written in Estonian. It is worth mentioning that the difference between the textbooks written in Russian language are negligible and minor.

    image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ DSpace at Tartu Univ...arrow_drop_down
    image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
    0
    citations0
    popularityAverage
    influenceAverage
    impulseAverage
    BIP!Powered by BIP!
    more_vert
      image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ DSpace at Tartu Univ...arrow_drop_down
      image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
  • image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
    Authors: Freienthal, Linda; Pelicon, Andraž; Martinc, Matej; Škrlj, Blaž; +8 Authors

    This dataset contains articles from EMBEDDIA Media partners with various information added by the tools developed within the EMBEDDIA project: - 12,390 Estonian articles from 2019 with tags given by Ekspress Meedia. The complete dataset without the output of EMBEDDIA tools is available at http://hdl.handle.net/11356/1408 - 5,000 Croatian articles from autumn of 2010 with tags given by 24sata. The complete dataset without the output of EMBEDDIA tools is available at http://hdl.handle.net/11356/1410 - 15,264 Latvian articles from 2019 with tags given by Ekspress Meedia. The complete dataset without the output of EMBEDDIA tools is available at http://hdl.handle.net/11356/1409 All the articles in the dataset have been analysed with texta-mlp Python package (https://pypi.org/project/texta-mlp/) via the EMBEDDIA Media assistant's Texta Toolkit (https://docs.texta.ee/). The tools used to analyse the articles were the following: - Latin1 and Latin2 Name Entity Recognition Tool modules (Cabrera-Diego et al., 2021, both described in https://aclanthology.org/2021.bsnlp-1.12/) . The Latin 1 results can be found folders annotated_articles_ner_latin1/ and annotated_articles_all_tools/, while the Latin 2 results are in annotated_articles_nerlatin2/ or annotated_articles_all_tools/. - RAKUN keyword extractor. RAKUN (Škrlj et al. 2019) is an unsupervised system for keyword extraction, so it can be used for any language. It detects keywords by turning text into a graph and the most important nodes in the graph mostly turn out to be the keywords. It is described in https://link.springer.com/chapter/10.1007/978-3-030-31372-2_26. The keyword annotation results can be found in the folder annotated_articles_rakun/ or annotated_articles_all_tools/. - TNT-KID keyword extractor. TNT-KID (Martinc et al. 2021, ) is a supervised system for automatic keyword extraction. It was trained on a corpus of articles with human-assigned keywords. For Croatian, the annotators were 24sata editors, for Estonian the Ekspress Meedia staff and for Latvian the Latvian Delfi staff. The system is further documented at https://doi.org/10.1017/S1351324921000127. For Croatian only TNT-KID was applied, while for Estonian and Latvian, the TNT-KID with TF-IDF, and extension by Koloski et al. (https://aclanthology.org/2021.hackashop-1.4.pdf) was used. The results of applying this tool are found in the folder annotated articles tnt_kid/ or annotated articles all tools/. - Sentiment analysis. Our news sentiment analyser (Pelicon et al. 2020) labels a news article as being of positive, negative, or neutral sentiment, using a fine-tuned multilingual BERT model, which was trained on Slovene sentiment annotated news articles. The system is further documented in https://doi.org/10.3390/app10175993. The results of this tools are found in the folder annotated articles sentiment/ or annotated articles all tools/. All the data is encoded in "JSON Lines" format. Each folder has its own README file which explains the structure of the files.

    image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ CLARIN.SI repositoryarrow_drop_down
    image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
    CLARIN.SI repository
    Other ORP type . 2022
    Data sources: B2FIND
    addClaim

    This Research product is the result of merged Research products in OpenAIRE.

    You have already added works in your ORCID record related to the merged Research product.
    0
    citations0
    popularityAverage
    influenceAverage
    impulseAverage
    BIP!Powered by BIP!
    more_vert
      image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ CLARIN.SI repositoryarrow_drop_down
      image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
      CLARIN.SI repository
      Other ORP type . 2022
      Data sources: B2FIND
      addClaim

      This Research product is the result of merged Research products in OpenAIRE.

      You have already added works in your ORCID record related to the merged Research product.
  • image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
    Authors: Purver, Matthew; Pollak, Senja; Freienthal, Linda; Kuulmets, Hele-Andra; +2 Authors

    The dataset is an archive of articles from the Ekspress Meedia news site from 2009-2019, containing over 1.4M articles, mostly in Estonian language (1,115,120 articles) with some in Russian (325,952 articles). Keywords are included for articles after 2015. The main archive is in file ee_articles_2009_2019. Other files contain derived versions and subsets - please see README files inside those zip files. The main archive contains JSON files of all the Estonian articles from the year 2009 to 2019 May. These datasets are intended for usage in EMBEDDIA, a H2020 project. Articles are in Estonian language with some in Russian. The main archive is in file ee_*articles_*2009_2019. Other files contain derived versions and subsets (please see README files inside those zip files), in short: - eearticles2015-2019: This dataset contains Estonian and Russian articles - 5 years, with tags, that were missing in the previous versions. - files eearticles20152019lemmatized and eearticles20092014lemmatized are the files preprocessed by TEXTA (contact linda@texta.ee) - in file eeandsttarticlelemmasembeddingsand_usage you can find w2v embeddings trained by TEXTA (contact linda@texta.ee) Description of the Main Dataset (eearticles_2009_2019) There are 12 JSON files: articles_2009_ver2.json contains 161394 articles from the year 2009 articles_2010_ver2.json contains 151033 articles from the year 2010 articles_2011_ver2.json contains 168273 articles from the year 2011 articles_2012_ver2.json contains 152772 articles from the year 2012 articles_2013_ver2.json contains 141012 articles from the year 2013 articles_2014_ver2.json contains 128388 articles from the year 2014 articles_2015_ver2.json contains 127425 articles from the year 2015 articles_2016_ver2.json contains 130704 articles from the year 2016 articles_2017_ver2.json contains 119318 articles from the year 2017 articles_2018_ver2.json contains 117388 articles from the year 2018 articles_2019_Jan-Apr_ver2.json contains 35076 articles from the year 2019 January to April articles_2019_May_ver2.json contains 8329 articles from the year 2019 May In sum: 1 441 112 articles Each JSON file is a list of dictionaries, i.e. each article is represented as a dictionary. Each dictionary contains the following: id (integer) - the ID of the article title (string) - the title of the article lead (string) - the lead of the article (can contain HTML, e.g. tag) url (string) - the URL of the article tags (list of dictionaries or None) [1]: each dictionary represents one tag. The tag dictionary contains the following: domain_id (string) [2] - the ID of the domain id (string) - the ID of the tag lang (string) - the language of the tag tag (string) - the tag itself, e.g. Kert Kingo (a name) translitted_name (string) - a modified version of the tag, e.g. kert-kingo rawBody (string) - the raw text of the article (contains HTML) bodyText (string) - clean article text (stripped from HTML) publishDate (string) - published date & time of the article categoryPrimary (dictionary or empty list) - the dictionary contains the following information: categoryId (integer) - the ID of the category categoryName (string)- the name of the category (e.g. World) channelId (integer) - the ID of the channel OR articleId (integer) - the ID of the article categoryId (integer) - the ID of the category categoryName (string)- the name of the category (e.g. World) categoryPrimary (boolean) - unknown categorySort (integer) - unknown categoryUrl (string) - the URL of the category categoryVisible (boolean) - unknown channelId (integer) - the ID of the channel channelUrl (string) - the URL of the channel (e.g. 'https://sport.delfi.ee') directoryName (string) - unknown parentId (integer) - unknown channelLanguage (string or None) [3] - the language of the channel categoryLanguage (int or None) [4] -unknown commentCount (int) [5] - the number of comments relatedArticles (list of integers) - a list of related articles' ID's

    image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ CLARIN.SI repositoryarrow_drop_down
    image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
    CLARIN.SI repository
    Other ORP type . 2021
    Data sources: B2FIND
    addClaim

    This Research product is the result of merged Research products in OpenAIRE.

    You have already added works in your ORCID record related to the merged Research product.
    0
    citations0
    popularityAverage
    influenceAverage
    impulseAverage
    BIP!Powered by BIP!
    more_vert
      image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ CLARIN.SI repositoryarrow_drop_down
      image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
      CLARIN.SI repository
      Other ORP type . 2021
      Data sources: B2FIND
      addClaim

      This Research product is the result of merged Research products in OpenAIRE.

      You have already added works in your ORCID record related to the merged Research product.
  • image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
    Authors: Shekhar, Ravi; Pollak, Senja; Pelicon, Andraž; Matthew, Purver; +1 Authors

    This dataset is an archive of reader comments on the Ekspress Meedia news site from 2009-2019, containing approximately 31M comments, mostly in the Estonian language, with some in Russian. Description of the Datasets. There are 11 CSV files: comments_2009.csv contains 2 898 438 comments from the year 2009 comments_2010.csv contains 2 377 591 comments from the year 2010 comments_2011.csv contains 2 729 389 comments from the year 2011 comments_2012.csv contains 3 372 776 comments from the year 2012 comments_2013.csv contains 3 289 393 comments from the year 2013 comments_2014.csv contains 3 195 502 comments from the year 2014 comments_2015.csv contains 3 202 592 comments from the year 2015 comments_2016.csv contains 2 848 624 comments from the year 2016 comments_2017.csv contains 2 838 075 comments from the year 2017 comments_2018.csv contains 3 194 597 comments from the year 2018 comments_2019.csv contains 1 526 755 comments from the year 2019 May In sum: 3 1473 732 comments Columns: comment_id (string) - the ID of the written comment article_id (string) - the ID of the article for which the comment was written created_time (string) - the time and date of the comment subject (string) - the title of the comment reply_to_comment_id (string) - the parent comments ID content (string) - the comment itself is_anonymous (string) - 1 if the comment was published anonymously 0 if the comment was published by a registered user is_enabled (string) - 1 if the comment was published (online) 0 if it wasn’t published Questionable field: not all have been manually moderated No additional information from the moderators channel_language (string) - the language of the channel: 'nat' for Estonian, 'rus' for Russian create_user_id (string) - the user ID of the commentator '0' for all blocked comments. moderated_by (string) - the ID of the moderator

    image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ CLARIN.SI repositoryarrow_drop_down
    image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
    CLARIN.SI repository
    Other ORP type . 2021
    Data sources: B2FIND
    addClaim

    This Research product is the result of merged Research products in OpenAIRE.

    You have already added works in your ORCID record related to the merged Research product.
    0
    citations0
    popularityAverage
    influenceAverage
    impulseAverage
    BIP!Powered by BIP!
    more_vert
      image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ CLARIN.SI repositoryarrow_drop_down
      image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
      CLARIN.SI repository
      Other ORP type . 2021
      Data sources: B2FIND
      addClaim

      This Research product is the result of merged Research products in OpenAIRE.

      You have already added works in your ORCID record related to the merged Research product.
Powered by OpenAIRE graph
Advanced search in
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
4 Research products
  • image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
    Authors: Arutyunyan, David;

    The aim of this thesis is to investigate Estonian history textbooks published in the Republic of Estonia in Estonian as well as in Russian languages and used in the gymnasium during the period from 1991-2012. The thesis is focused on the following landmarks in the history of Estonia: The World War II, the Stalin's repressions, the time under the Soviet rule, the collapse of the Soviet union and the sovereignty of Estonia stemming from it. It is these periods and landmarks which may originate different theoretical interpretations of history in terms of the framework of Estonian Republic. Estonian as well as Russian textbooks of history for gymnasium can serve as sources and media for interpretations. Topicality and actual value of the thesis today is certain and evident owing to the reasons. Since it is perfectly possible to influence and shape the worldviews and perspectives of a personality especially during one's reaching the age of puberty. History textbooks or as in the case Estonian textbooks on history in particular are the means enabling to create certain viewpoints and perspectives. School plays the role of institution with its curricula realizing the mentioned function. The young among the local Estonians as well as Russian-speaking minorities may have different perspectives and viewpoints. Besides that, the Estonian textbooks used do not necessarily reflect the historical facts the same way, neither give similar understanding of the same landmarks of significance. Given the mentioned assumptions it is important to learn whether the history textbooks published in this particular time lapse in Estonia have possibly influenced in certain way and shaped different understandings of the past events or on the contrary, suggest the same identical worldview and perspective or imply several alternative possible interpretations. There have been made certain conclusions with the help of analysis made on the basis of comparisons and juxtaposing. The conclusions found are that Estonian textbooks can shape and create different viewpoints being in stark contrast with one another. There are differences between the Estonians textbooks written in Russian and Estonian as well as numerous differences between the textbooks written in Estonian. It is worth mentioning that the difference between the textbooks written in Russian language are negligible and minor.

    image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ DSpace at Tartu Univ...arrow_drop_down
    image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
    0
    citations0
    popularityAverage
    influenceAverage
    impulseAverage
    BIP!Powered by BIP!
    more_vert
      image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ DSpace at Tartu Univ...arrow_drop_down
      image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
  • image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
    Authors: Freienthal, Linda; Pelicon, Andraž; Martinc, Matej; Škrlj, Blaž; +8 Authors

    This dataset contains articles from EMBEDDIA Media partners with various information added by the tools developed within the EMBEDDIA project: - 12,390 Estonian articles from 2019 with tags given by Ekspress Meedia. The complete dataset without the output of EMBEDDIA tools is available at http://hdl.handle.net/11356/1408 - 5,000 Croatian articles from autumn of 2010 with tags given by 24sata. The complete dataset without the output of EMBEDDIA tools is available at http://hdl.handle.net/11356/1410 - 15,264 Latvian articles from 2019 with tags given by Ekspress Meedia. The complete dataset without the output of EMBEDDIA tools is available at http://hdl.handle.net/11356/1409 All the articles in the dataset have been analysed with texta-mlp Python package (https://pypi.org/project/texta-mlp/) via the EMBEDDIA Media assistant's Texta Toolkit (https://docs.texta.ee/). The tools used to analyse the articles were the following: - Latin1 and Latin2 Name Entity Recognition Tool modules (Cabrera-Diego et al., 2021, both described in https://aclanthology.org/2021.bsnlp-1.12/) . The Latin 1 results can be found folders annotated_articles_ner_latin1/ and annotated_articles_all_tools/, while the Latin 2 results are in annotated_articles_nerlatin2/ or annotated_articles_all_tools/. - RAKUN keyword extractor. RAKUN (Škrlj et al. 2019) is an unsupervised system for keyword extraction, so it can be used for any language. It detects keywords by turning text into a graph and the most important nodes in the graph mostly turn out to be the keywords. It is described in https://link.springer.com/chapter/10.1007/978-3-030-31372-2_26. The keyword annotation results can be found in the folder annotated_articles_rakun/ or annotated_articles_all_tools/. - TNT-KID keyword extractor. TNT-KID (Martinc et al. 2021, ) is a supervised system for automatic keyword extraction. It was trained on a corpus of articles with human-assigned keywords. For Croatian, the annotators were 24sata editors, for Estonian the Ekspress Meedia staff and for Latvian the Latvian Delfi staff. The system is further documented at https://doi.org/10.1017/S1351324921000127. For Croatian only TNT-KID was applied, while for Estonian and Latvian, the TNT-KID with TF-IDF, and extension by Koloski et al. (https://aclanthology.org/2021.hackashop-1.4.pdf) was used. The results of applying this tool are found in the folder annotated articles tnt_kid/ or annotated articles all tools/. - Sentiment analysis. Our news sentiment analyser (Pelicon et al. 2020) labels a news article as being of positive, negative, or neutral sentiment, using a fine-tuned multilingual BERT model, which was trained on Slovene sentiment annotated news articles. The system is further documented in https://doi.org/10.3390/app10175993. The results of this tools are found in the folder annotated articles sentiment/ or annotated articles all tools/. All the data is encoded in "JSON Lines" format. Each folder has its own README file which explains the structure of the files.

    image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ CLARIN.SI repositoryarrow_drop_down
    image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
    CLARIN.SI repository
    Other ORP type . 2022
    Data sources: B2FIND
    addClaim

    This Research product is the result of merged Research products in OpenAIRE.

    You have already added works in your ORCID record related to the merged Research product.
    0
    citations0
    popularityAverage
    influenceAverage
    impulseAverage
    BIP!Powered by BIP!
    more_vert
      image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ CLARIN.SI repositoryarrow_drop_down
      image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
      CLARIN.SI repository
      Other ORP type . 2022
      Data sources: B2FIND
      addClaim

      This Research product is the result of merged Research products in OpenAIRE.

      You have already added works in your ORCID record related to the merged Research product.
  • image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
    Authors: Purver, Matthew; Pollak, Senja; Freienthal, Linda; Kuulmets, Hele-Andra; +2 Authors

    The dataset is an archive of articles from the Ekspress Meedia news site from 2009-2019, containing over 1.4M articles, mostly in Estonian language (1,115,120 articles) with some in Russian (325,952 articles). Keywords are included for articles after 2015. The main archive is in file ee_articles_2009_2019. Other files contain derived versions and subsets - please see README files inside those zip files. The main archive contains JSON files of all the Estonian articles from the year 2009 to 2019 May. These datasets are intended for usage in EMBEDDIA, a H2020 project. Articles are in Estonian language with some in Russian. The main archive is in file ee_*articles_*2009_2019. Other files contain derived versions and subsets (please see README files inside those zip files), in short: - eearticles2015-2019: This dataset contains Estonian and Russian articles - 5 years, with tags, that were missing in the previous versions. - files eearticles20152019lemmatized and eearticles20092014lemmatized are the files preprocessed by TEXTA (contact linda@texta.ee) - in file eeandsttarticlelemmasembeddingsand_usage you can find w2v embeddings trained by TEXTA (contact linda@texta.ee) Description of the Main Dataset (eearticles_2009_2019) There are 12 JSON files: articles_2009_ver2.json contains 161394 articles from the year 2009 articles_2010_ver2.json contains 151033 articles from the year 2010 articles_2011_ver2.json contains 168273 articles from the year 2011 articles_2012_ver2.json contains 152772 articles from the year 2012 articles_2013_ver2.json contains 141012 articles from the year 2013 articles_2014_ver2.json contains 128388 articles from the year 2014 articles_2015_ver2.json contains 127425 articles from the year 2015 articles_2016_ver2.json contains 130704 articles from the year 2016 articles_2017_ver2.json contains 119318 articles from the year 2017 articles_2018_ver2.json contains 117388 articles from the year 2018 articles_2019_Jan-Apr_ver2.json contains 35076 articles from the year 2019 January to April articles_2019_May_ver2.json contains 8329 articles from the year 2019 May In sum: 1 441 112 articles Each JSON file is a list of dictionaries, i.e. each article is represented as a dictionary. Each dictionary contains the following: id (integer) - the ID of the article title (string) - the title of the article lead (string) - the lead of the article (can contain HTML, e.g. tag) url (string) - the URL of the article tags (list of dictionaries or None) [1]: each dictionary represents one tag. The tag dictionary contains the following: domain_id (string) [2] - the ID of the domain id (string) - the ID of the tag lang (string) - the language of the tag tag (string) - the tag itself, e.g. Kert Kingo (a name) translitted_name (string) - a modified version of the tag, e.g. kert-kingo rawBody (string) - the raw text of the article (contains HTML) bodyText (string) - clean article text (stripped from HTML) publishDate (string) - published date & time of the article categoryPrimary (dictionary or empty list) - the dictionary contains the following information: categoryId (integer) - the ID of the category categoryName (string)- the name of the category (e.g. World) channelId (integer) - the ID of the channel OR articleId (integer) - the ID of the article categoryId (integer) - the ID of the category categoryName (string)- the name of the category (e.g. World) categoryPrimary (boolean) - unknown categorySort (integer) - unknown categoryUrl (string) - the URL of the category categoryVisible (boolean) - unknown channelId (integer) - the ID of the channel channelUrl (string) - the URL of the channel (e.g. 'https://sport.delfi.ee') directoryName (string) - unknown parentId (integer) - unknown channelLanguage (string or None) [3] - the language of the channel categoryLanguage (int or None) [4] -unknown commentCount (int) [5] - the number of comments relatedArticles (list of integers) - a list of related articles' ID's

    image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ CLARIN.SI repositoryarrow_drop_down
    image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
    CLARIN.SI repository
    Other ORP type . 2021
    Data sources: B2FIND
    addClaim

    This Research product is the result of merged Research products in OpenAIRE.

    You have already added works in your ORCID record related to the merged Research product.
    0
    citations0
    popularityAverage
    influenceAverage
    impulseAverage
    BIP!Powered by BIP!
    more_vert
      image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ CLARIN.SI repositoryarrow_drop_down
      image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
      CLARIN.SI repository
      Other ORP type . 2021
      Data sources: B2FIND
      addClaim

      This Research product is the result of merged Research products in OpenAIRE.

      You have already added works in your ORCID record related to the merged Research product.
  • image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
    Authors: Shekhar, Ravi; Pollak, Senja; Pelicon, Andraž; Matthew, Purver; +1 Authors

    This dataset is an archive of reader comments on the Ekspress Meedia news site from 2009-2019, containing approximately 31M comments, mostly in the Estonian language, with some in Russian. Description of the Datasets. There are 11 CSV files: comments_2009.csv contains 2 898 438 comments from the year 2009 comments_2010.csv contains 2 377 591 comments from the year 2010 comments_2011.csv contains 2 729 389 comments from the year 2011 comments_2012.csv contains 3 372 776 comments from the year 2012 comments_2013.csv contains 3 289 393 comments from the year 2013 comments_2014.csv contains 3 195 502 comments from the year 2014 comments_2015.csv contains 3 202 592 comments from the year 2015 comments_2016.csv contains 2 848 624 comments from the year 2016 comments_2017.csv contains 2 838 075 comments from the year 2017 comments_2018.csv contains 3 194 597 comments from the year 2018 comments_2019.csv contains 1 526 755 comments from the year 2019 May In sum: 3 1473 732 comments Columns: comment_id (string) - the ID of the written comment article_id (string) - the ID of the article for which the comment was written created_time (string) - the time and date of the comment subject (string) - the title of the comment reply_to_comment_id (string) - the parent comments ID content (string) - the comment itself is_anonymous (string) - 1 if the comment was published anonymously 0 if the comment was published by a registered user is_enabled (string) - 1 if the comment was published (online) 0 if it wasn’t published Questionable field: not all have been manually moderated No additional information from the moderators channel_language (string) - the language of the channel: 'nat' for Estonian, 'rus' for Russian create_user_id (string) - the user ID of the commentator '0' for all blocked comments. moderated_by (string) - the ID of the moderator

    image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ CLARIN.SI repositoryarrow_drop_down
    image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
    CLARIN.SI repository
    Other ORP type . 2021
    Data sources: B2FIND
    addClaim

    This Research product is the result of merged Research products in OpenAIRE.

    You have already added works in your ORCID record related to the merged Research product.
    0
    citations0
    popularityAverage
    influenceAverage
    impulseAverage
    BIP!Powered by BIP!
    more_vert
      image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ CLARIN.SI repositoryarrow_drop_down
      image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
      CLARIN.SI repository
      Other ORP type . 2021
      Data sources: B2FIND
      addClaim

      This Research product is the result of merged Research products in OpenAIRE.

      You have already added works in your ORCID record related to the merged Research product.
Powered by OpenAIRE graph