Advanced search in
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
Include:
107 Research products, page 1 of 11

  • Research data
  • Research software
  • Other research products
  • Bulgarian

10
arrow_drop_down
Date (most recent)
arrow_drop_down
  • Restricted Bulgarian
    Authors: 
    Irina Temnikova; Silvia Gargova; Veneta Kireva; Tsvetelina Stefanova;
    Publisher: Zenodo

    This dataset has been created within Project TRACES (more information: https://traces.gate-ai.eu/). The dataset contains 61411 tweet IDs of tweets, written in Bulgarian, with annotations. The dataset can be used for general use or for building lies and disinformation detection applications. The tweets have been collected via Twitter API under academic access between 1 Jan 2020 - 28 June 2022 and with the following keywords: (Covid OR коронавирус OR Covid19 OR Covid-19 OR Covid_19) - without replies and without retweets (Корона OR корона OR Corona OR пандемия OR пандемията OR Spikevax OR SARS-CoV-2 OR бустерна доза) - with replies, but without retweets Explanations of which fields can be used as markers of lies (or of intentional disinformation) are provided in our forthcoming paper (please follow the updates on our website: https://traces.gate-ai.eu/?page_id=20). The dataset contains the following fields: tweet_id the ID of each tweet sentence_count the number of sentences in the post words_per_sentence the number of words in each sentence words_count the total number of words in each post average_words_per_sentence the average number of words per sentence in the post clarin_classla_ner the named entity tags in the post by Clarin Classla slavic_bert_ner the social media post, with named entity tags, by Slavic BERT slavic_bert_ner_words the social media post tokenized by Slavic BERT ner_count_bert the number of named entities by type, by Slavic BERT ner_count_classla the number of named entities by type, by Clarin Classla ner_count_all_bert total number of all named entities in the social media post, by Slavic BERT ner_count_all_classla total number of all named entities in the social media post, by Clarin Classla NE_in_message_bert lists of the named entities in the post by type, by Slavic BERT NE_in_message_classla lists of the named entities in the post by type, by Clarin Classla count_upos_all number of words, grouped by part-of-speech tag, by Clarin Classla upos_in_message list of words, grouped by part-of-speech tag, by Clarin Classla type_token_ratio the number of unique word forms divided by the number of all words in the post content_word_diversity the number of unique content words divided by the number of all content words passive_voice_count the number of occurrences of passive voice in the post past_tense_count the number of occurrences of past tense in the post negative_count the number of occurrences of negative forms in the post self-ref/pronouns counts in the post of self-reference pronouns (1 person Sg./Pl.) and other personal pronouns (2nd, and 3rd person) еmotiveness total num. of adjectives + total number of adverbs / total number of nouns + total number of verbs (from Zhou et al, 2004) pausality total number of punctuation marks/total number of sentences (from Zhou et al, 2004) num_funct_words total number of function words num_conj total number of conjunctions redundancy total number of function words/total number of sentences (from Zhou et al, 2004) volition_words count in the post of volitional words (will, wish, coerce, impose) in Bulgarian expression_time occurrence of words from the time espressions list in each sentence, with a weight 2 if the marker is at the beginning of the sentence and 1 if not expression_spatial occurrence of words from the spatial espressions list in each sentence, with a weight 2 if the marker is at the beginning of the sentence and 1 if not expression_negative occurrence of words from the negative espressions list in each sentence, with a weight 2 if the marker is at the beginning of the sentence and 1 if not expression_cognitive_operations occurrence of words from the cognitive operations espressions list in each sentence, with a weight 2 if the marker is at the beginning of the sentence and 1 if not expression_verbs_detail occurrence of words from the details espressions list in each sentence, with a weight 2 if the marker is at the beginning of the sentence and 1 if not sense_expressions occurrence of words from the sense espressions list in the whole post, with a weight 2 if the marker is at the beginning of the sentence and 1 if not feeling_expressions occurrence of words from the feeling espressions list in the whole post, with a weight 2 if the marker is at the beginning of the sentence and 1 if not doubt_confidence_expressions occurence of words from the doubt/confidence expressions list in in the whole post, with a weight 2 if the marker is at the beginning of the sentence and 1 if not disc_mark total number of discourse markers, a list of the detected discourse markers generaliz_markers total number of generalization markers, followed by the list of recognized discourse markers attention_expressions occurrence of words from the attention-attracting expressions list in each sentence, with a weight 2 if the marker is at the beginning of the sentence and 1 if not duplicate_phrases repeated words or expressions, a potential characteristic of automatically generated messages (e.g. deepfakes) uppercase_middle_words words with uppercase letters in the middle, potential characteristic of automatically generated messages (e.g. deepfakes) lowercase_beginning_sentences information for each sentence - whether the sentence begins with a lowercase letter (True) or not (False) num_of_urls number of links per post num_of_hashtags number of hashtags per post num_of_mentions number of mentions per post

  • Restricted Bulgarian
    Authors: 
    Irina Temnikova; Silvia Gargova; Veneta Kireva; Tsvetelina Stefanova;
    Publisher: Zenodo

    This dataset has been created within Project TRACES (more information: https://traces.gate-ai.eu/). The dataset contains 8791 anonymized Telegram social media posts, written in Bulgarian. The dataset is annotated with general information (named entities, part-of-speech tags, sentence length, etc.) and specific markers signaling details and can be used for general purposes or for building lies, manipulation, and disinformation detection applications. The social media posts have been collected via Telegram Desktop. Explanations of which fields can be used as markers of lies (or of intentional disinformation) are provided in our forthcoming paper (please follow the updates on our website: https://traces.gate-ai.eu/?page_id=20).

  • Restricted Bulgarian
    Authors: 
    Irina Temnikova; Silvia Gargova; Veneta Kireva; Tsvetelina Stefanova;
    Publisher: Zenodo

    This dataset has been created within Project TRACES (more information: https://traces.gate-ai.eu/). The dataset contains 15850 tweet IDs of tweets, written in Bulgarian, with annotations. The dataset can be used for general use or for building lies and disinformation detection applications. The tweets have been collected via Twitter API under academic access between 1 Jan 2020-7 July 2022 and with the following keywords without retweets: (ваксиниран депутат) OR (ваксинирани депутати) (язовири премиер) OR (язовири прокуратура) OR (язовири прокуратурата) ((мвр хемус) OR мвр) (прокуратура OR прокуратурата) (шефът тотото) OR (изпълнителният директор Българския спортен тотализатор) (кирил петков двойно гражданство) OR (премиер двойно гражданство) OR (премиер гражданство) ((Пътна OR загубена OR загуби OR изчезнала) карта газпром) (министър плагиат плагиатство) OR (плагиат плагиатство) ((изслушване главния прокурор) OR (иван гешев)) (фалшива диплома) (златни паспорти) (апартаментгейт OR (къща за гости) OR (къщи за гости) (оръжия OR оръжие) (Украйна OR украина) ((цена OR цени) (газ OR ток OR нафта OR бензин)) (мвр OR данс) (фалшиви новини) (данъци OR данъчни OR данък) ((кораб Царевна) OR Царевна) (Северна Македония) Explanations of which fields can be used as markers of lies (or of intentional disinformation) are provided in our forthcoming paper (please follow the updates on our website: https://traces.gate-ai.eu/?page_id=20). The dataset contains the following fields: tweet_id the ID of each tweet sentence_count the number of sentences in the post words_per_sentence the number of words in each sentence words_count the total number of words in each post average_words_per_sentence the average number of words per sentence in the post clarin_classla_ner the named entity tags in the post by Clarin Classla slavic_bert_ner the social media post, with named entity tags, by Slavic BERT slavic_bert_ner_words the social media post tokenized by Slavic BERT ner_count_bert the number of named entities by type, by Slavic BERT ner_count_classla the number of named entities by type, by Clarin Classla ner_count_all_bert total number of all named entities in the social media post, by Slavic BERT ner_count_all_classla total number of all named entities in the social media post, by Clarin Classla NE_in_message_bert lists of the named entities in the post by type, by Slavic BERT NE_in_message_classla lists of the named entities in the post by type, by Clarin Classla count_upos_all number of words, grouped by part-of-speech tag, by Clarin Classla upos_in_message list of words, grouped by part-of-speech tag, by Clarin Classla type_token_ratio the number of unique word forms divided by the number of all words in the post content_word_diversity the number of unique content words divided by the number of all content words passive_voice_count the number of occurrences of passive voice in the post past_tense_count the number of occurrences of past tense in the post negative_count the number of occurrences of negative forms in the post self-ref/pronouns counts in the post of self-reference pronouns (1 person Sg./Pl.) and other personal pronouns (2nd, and 3rd person) еmotiveness total num. of adjectives + total number of adverbs / total number of nouns + total number of verbs (from Zhou et al, 2004) pausality total number of punctuation marks/total number of sentences (from Zhou et al, 2004) num_funct_words total number of function words num_conj total number of conjunctions redundancy total number of function words/total number of sentences (from Zhou et al, 2004) volition_words count in the post of volitional words (will, wish, coerce, impose) in Bulgarian expression_time occurrence of words from the time espressions list in each sentence, with a weight 2 if the marker is at the beginning of the sentence and 1 if not expression_spatial occurrence of words from the spatial espressions list in each sentence, with a weight 2 if the marker is at the beginning of the sentence and 1 if not expression_negative occurrence of words from the negative espressions list in each sentence, with a weight 2 if the marker is at the beginning of the sentence and 1 if not expression_cognitive_operations occurrence of words from the cognitive operations espressions list in each sentence, with a weight 2 if the marker is at the beginning of the sentence and 1 if not expression_verbs_detail occurrence of words from the details espressions list in each sentence, with a weight 2 if the marker is at the beginning of the sentence and 1 if not sense_expressions occurrence of words from the sense espressions list in the whole post, with a weight 2 if the marker is at the beginning of the sentence and 1 if not feeling_expressions occurrence of words from the feeling espressions list in the whole post, with a weight 2 if the marker is at the beginning of the sentence and 1 if not doubt_confidence_expressions occurence of words from the doubt/confidence expressions list in in the whole post, with a weight 2 if the marker is at the beginning of the sentence and 1 if not disc_mark total number of discourse markers, a list of the detected discourse markers generaliz_markers total number of generalization markers, followed by the list of recognized discourse markers attention_expressions occurrence of words from the attention-attracting expressions list in each sentence, with a weight 2 if the marker is at the beginning of the sentence and 1 if not duplicate_phrases repeated words or expressions, a potential characteristic of automatically generated messages (e.g. deepfakes) uppercase_middle_words words with uppercase letters in the middle, potential characteristic of automatically generated messages (e.g. deepfakes) lowercase_beginning_sentences information for each sentence - whether the sentence begins with a lowercase letter (True) or not (False) num_of_urls number of links per post num_of_hashtags number of hashtags per post num_of_mentions number of mentions per post

  • Restricted Bulgarian
    Authors: 
    Temnikova, Irina; Gargova, Silvia; Kireva, Veneta; Tsvetelina Stefanova;
    Publisher: Zenodo

    This dataset has been created within Project TRACES (more information: https://traces.gate-ai.eu/). The dataset is in .csv format and contains 32518 tweet IDs of tweets, written in Bulgarian, with annotations. The dataset can be used for general purposes or for building lies and disinformation detection applications (by using the annotations with the linguistic markers of lies). The tweets have been collected via Twitter API under academic access between 1 Jan 2020-27 June 2022 and with the following keywords: (лъжа OR лъжи OR лицемерие OR лъжат OR излъга OR измама OR измамници OR измами OR лъжец OR лъжци) (фалшиви OR fakenews OR невярно OR неверни OR подвеждащи OR подвеждащо OR неистини) - without retweets (манипулация OR манипулира OR стъкмистика OR крие OR далавераджия OR далавери OR далавера) - without retweets Explanation of which fields can be used as markers of lies (or of intentional disinformation) are provided in our forthcoming paper (please follow the updates on our website: https://traces.gate-ai.eu/?page_id=20). The dataset contains the following fields: tweet_id the ID of each tweet sentence_count the number of sentences in the post words_per_sentence the number of words in each sentence words_count the total number of words in each post average_words_per_sentence the average number of words per sentence in the post clarin_classla_ner the named entity tags in the post by Clarin Classla slavic_bert_ner the social media post, with named entity tags, by Slavic BERT slavic_bert_ner_words the social media post tokenized by Slavic BERT ner_count_bert the number of named entities by type, by Slavic BERT ner_count_classla the number of named entities by type, by Clarin Classla ner_count_all_bert total number of all named entities in the social media post, by Slavic BERT ner_count_all_classla total number of all named entities in the social media post, by Clarin Classla NE_in_message_bert lists of the named entities in the post by type, by Slavic BERT NE_in_message_classla lists of the named entities in the post by type, by Clarin Classla count_upos_all number of words, grouped by part-of-speech tag, by Clarin Classla upos_in_message list of words, grouped by part-of-speech tag, by Clarin Classla type_token_ratio the number of unique word forms divided by the number of all words in the post content_word_diversity the number of unique content words divided by the number of all content words passive_voice_count the number of occurrences of passive voice in the post past_tense_count the number of occurrences of past tense in the post negative_count the number of occurrences of negative forms in the post self-ref/pronouns counts in the post of self-reference pronouns (1 person Sg./Pl.) and other personal pronouns (2nd, and 3rd person) еmotiveness total num. of adjectives + total number of adverbs / total number of nouns + total number of verbs (from Zhou et al, 2004) pausality total number of punctuation marks/total number of sentences (from Zhou et al, 2004) num_funct_words total number of function words num_conj total number of conjunctions redundancy total number of function words/total number of sentences (from Zhou et al, 2004) volition_words count in the post of volitional words (will, wish, coerce, impose) in Bulgarian expression_time occurrence of words from the time expressions list in each sentence, with a weight 2 if the marker is at the beginning of the sentence and 1 if not expression_spatial occurrence of words from the spatial expressions list in each sentence, with a weight 2 if the marker is at the beginning of the sentence and 1 if not expression_negative occurrence of words from the negative expressions list in each sentence, with a weight 2 if the marker is at the beginning of the sentence and 1 if not expression_cognitive_operations occurrence of words from the cognitive operations expressions list in each sentence, with a weight 2 if the marker is at the beginning of the sentence and 1 if not expression_verbs_detail occurrence of words from the details expressions list in each sentence, with a weight 2 if the marker is at the beginning of the sentence and 1 if not sense_expressions occurrence of words from the sense expressions list in the whole post, with a weight 2 if the marker is at the beginning of the sentence and 1 if not feeling_expressions occurrence of words from the feeling expressions list in the whole post, with a weight 2 if the marker is at the beginning of the sentence and 1 if not doubt_confidence_expressions occurrence of words from the doubt/confidence expressions list in in the whole post, with a weight 2 if the marker is at the beginning of the sentence and 1 if not disc_mark total number of discourse markers, a list of the detected discourse markers generaliz_markers total number of generalization markers, followed by the list of recognized discourse markers attention_expressions occurrence of words from the attention-attracting expressions list in each sentence, with a weight 2 if the marker is at the beginning of the sentence and 1 if not duplicate_phrases repeated words or expressions, a potential characteristic of automatically generated messages (e.g. deepfakes) uppercase_middle_words words with uppercase letters in the middle, potential characteristic of automatically generated messages (e.g. deepfakes) lowercase_beginning_sentences information for each sentence - whether the sentence begins with a lowercase letter (True) or not (False) num_of_urls number of links per post num_of_hashtags number of hashtags per post num_of_mentions number of mentions per post

  • Open Access Bulgarian
    Authors: 
    Ivanov Yanko; Kostov Ilyan;
    Publisher: Zenodo

    A comparative analysis of the current system for payment of compensations (indemnities) to farmers whose animals were destroyed in the frame of eradication of infectious animal diseases outbreaks in Bulgaria and in EU Member States is presented. The expediency of establishing a National Health Insurance Fund with the main subject of activity - implementation of compulsory health insurance of farm animals against infectious animal diseases, based on the principle of shared responsibility between the state institutions and animal owners and taking into account the level of biosecurity of livestock.

  • Open Access Bulgarian
    Authors: 
    Kostov Ilyan;
    Publisher: Zenodo

    This opinion was prepared by the Center for Risk Assessment in the Food Chain on the appropriateness of disinfection of the motor vehicles passing through the border crossings of the Republic of Bulgaria on the basis of an analysis of the epidemiological situation in the world, Europe and the Mediterranean and the possibility to stop the prophylactic disinfection of the country's border checkpoints. Based on the assessment, the Risk Assessment Center in the agri-food chain makes the following recommendations: 1. The introduction and cancellation of the disinfection of the BIP should be done by order of the Minister of Agriculture on the proposal of the Executive Director of the BFSA according to Art. 126 of the Veterinary Law or in case of calamity on Art. 15 of the Plant Protection Law by the Minister of Agriculture. The imposed measures shall be revoked by an order of the competent authority when the need for their implementation ceases. 2. In order to assess the expediency of introducing or terminating the prophylactic disinfection of the motor vehicles passing through the border checkpoints of the Republic of Bulgaria, it is necessary to take into account: (a) information on the spread of contagious animal diseases and plant pests in countries or regions thereof where restrictive measures have been announced in relation to the animal diseases listed in Annex II to Regulation (EU) 2016/429 and pests for plants on the lists of Delegated Regulation (EU) 2019/1702, Implementing Regulation (EU) 2018/2019, Implementing Regulation (EU) 2019/2072, as well as those listed in EPRO lists 1 and 2; (b) the type of consignments transported and the sanitary status of the countries or regions through which the transport corridors pass; c) identification of "high-risk" and potentially dangerous biological agents on the basis of epidemiological, economic and sociological criteria that could potentially be used as biological weapons when hostilities are taking place near the Bulgarian borders.

  • Open Access Bulgarian
    Authors: 
    Vasileva Madlen; Kostov Ilyan; Georgiev Georgi;
    Publisher: Zenodo

    For the period from 2002 to 2019 in Bulgaria, there is a trend of gradual decline in the incidence of echinococcosis (2002 - 832, 2019 - 193), as well as the average annual incidence of 8.2% ₀₀₀ in 2002 to 2.74% 2019 in 2019. However, in recent years Bulgaria continues to occupy a leading position in the incidence of echinococcosis among the European countries, remaining one of the countries with the highest endemicity for E. granulosus sl. in the EU. Studying the reasons for this high incidence of echinococcosis in Bulgaria, it shows the existence of serious gaps in the health culture and the behavior of the population, in the context of a prolonged socio-economic crisis, which lead to deteriorating health and a quality of life. The worrying fact is that the young people most often suffer - children and adolescents between 10-19 years (morbidity 7.3%. For the period 2009 – 2019, cases of echinococcosis were found in all regions of the country, with most primary cases and recurrences were registered in Plovdiv - 963, Burgas - 655 and Sliven regions - 631. The average incidence for the period 2000 - 2017 is unevenly distributed on the territory of the country. It varies from 1.6% in Sofia to 15.8% ₀₀₀ in Sliven region. A study of the role of environmental and natural factors in the spread of the disease in Bulgaria shows that echinococcosis is acquired at a "home", rural environment and it is considered to be more of a "soil / environment transmitted" infection, similar to "classical" helminthiasis, with a probable hand-to-mouth transmission mechanism, while through the food / water transmission may be of secondary importance. The golden jackal (Canis aureus) population appears to be on an increasing trend of migration from the EU's Eastern Member States to neighboring western countries, which should be taken into account when considering the potential future spread of E. multilocularis. In Bulgaria, as a country with a high prevalence of cystic echinococcosis, there are all conditions for the spread of alveococcosis in humans and E. multlocularis in foxes, dogs and other carnivorous mammals. Some of the most important breeders, such as field rodents, are also widespread. The high incidence of echinococcosis in the country compared to other Balkan and European countries, the delay in early diagnosis and its high proportion in childhood and adolescence age, based on scientific evidence, as well as the high proportion of infected pets are grounds for urgent renewal of the The National program for control of Echinococcosis in Bulgaria in order to achieve a lasting trend towards its reduction to sporadic among humans and animals.

  • Open Access Bulgarian
    Authors: 
    Rangachev, Antoni;
    Publisher: Zenodo

    Някои приложения на алгебрата, Първа лекция от курса

  • Open Access Bulgarian
    Authors: 
    Telarico, Fabio Ashtar;
    Publisher: Zenodo

    Transcripts accompanying the paper "A nationalist-conservative grammar of change?" in both MS Word format (the line numbers refer to this version) and RData. The latter can be used in the accompanying reproductible example (reprex)

  • Other research product . 2022
    Open Access Bulgarian
    Authors: 
    Bañón, Marta; Esplà-Gomis, Miquel; Forcada, Mikel L.; García-Romero, Cristian; Kuzman, Taja; Ljubešić, Nikola; van Noord, Rik; Pla Sempere, Leopoldo; Ramírez-Sánchez, Gema; Rupnik, Peter; +4 more
    Publisher: Jožef Stefan Institute

    The Bulgarian web corpus MaCoCu-bg 1.0 was built by crawling the ".bg" and ".бг" internet top-level domains in 2021, extending the crawl dynamically to other domains as well. The crawler is available at https://github.com/macocu/MaCoCu-crawler. Considerable efforts were devoted into cleaning the extracted text to provide a high-quality web corpus. This was achieved by removing boilerplate (https://corpus.tools/wiki/Justext) and near-duplicated paragraphs (https://corpus.tools/wiki/Onion), discarding very short texts as well as texts that are not in the target language. The dataset is characterized by extensive metadata which allows filtering the dataset based on text quality and other criteria (https://github.com/bitextor/monotextor), making the corpus highly useful for corpus linguistics studies, as well as for training language models and other language technologies. In the XML format, each document is accompanied by the following metadata: title, crawl date, url, domain, file type of the original document, distribution of languages inside the document, and a fluency score based on a language model. The text of each document is divided into paragraphs that are accompanied by metadata on the information whether a paragraph is a heading or not, metadata on the paragraph quality and fluency, the automatically identified language of the text in the paragraph, and information whether the paragraph contains sensitive information (identified via the Biroamer tool - https://github.com/bitextor/biroamer). The TSV format delivers sentence-level data, and contains the following metadata: sentence URL, paragraph and sentence ID within the document, a simhash and a quality score, which allow filtering out near-duplicate sentences (all sentences with the same simhash can be deleted, except for the one with the highest quality score), the language of the sentence, information on sentence fluency, and information whether the sentence contains personal or sensitive information (identified via the Biroamer sensitive data and named entity recognizer). Notice and take down: Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please: (1) Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted. (2) Clearly identify the copyrighted work claimed to be infringed. (3) Clearly identify the material that is claimed to be infringing and information reasonably sufficient in order to allow us to locate the material. (4) Please write to the contact person for this resource whose email is available in the full item record. We will comply with legitimate requests by removing the affected sources from the next release of the corpus. This action has received funding from the European Union's Connecting Europe Facility 2014-2020 - CEF Telecom, under Grant Agreement No. INEA/CEF/ICT/A2020/2278341. This communication reflects only the author’s view. The Agency is not responsible for any use that may be made of the information it contains.

Send a message
How can we help?
We usually respond in a few hours.