
handle: 1822/94079
This document reports a Master’s work, the final project of the 5th year of the Integrated Master’s in Informatics Engineering, that was accomplished at Universidade do Minho in Braga, Portugal. On February 24, 2022, a conflict between two countries, Ukraine and Russia, began. The war between two countries is devastating and affects many people, both residents of the countries directly involved and neighboring countries. As a highly significant event, it gathers coverage from many sources globally, including traditional print newspapers, online news platforms, social networks, blogs, television programs, and more. However, all of this information is scattered across different websites and social networks. If researchers (in the areas of Linguistics, History, Humanities, etc.) and curious people want to analyze this data, their work will be very difficult. Therefore, it is essential to gather the information on a single platform. This work aims to create an online corpus in the Portuguese language regarding the Ukraine War, based on Portuguese online newspapers’ news as well as comments on social media. To fulfill the goal of this work, initially, a variety of news sources were considered, and the Portuguese online newspapers “Público” and “Jornal de Negócios” were selected, as well as the platform “Reddit”. To extract the required information, the technique of Web Scraping was used. Therefore, for each source, an extractor was developed that extracted the necessary information and saved it in a JSON file. Following that, Natural Language Processing Techniques were used to process the gathered information. Afterward, the extracted information was stored in a non-relational database, MongoDB. Finally, a website called GUCO was designed and implemented, providing users with the capability to navigate and explore the created corpus. The GUCO website is available at the address: https://guco.epl.di.uminho.pt/.
Social network analysis, Reconstrução da guerra através das notícias, Rebuild the war through the news, Natural language processing, Online corpus, Corpus online, Ukraine war, Guerra da Ucrânia, Web scraping, Análise de redes sociais, Processamento de linguagem natural
Social network analysis, Reconstrução da guerra através das notícias, Rebuild the war through the news, Natural language processing, Online corpus, Corpus online, Ukraine war, Guerra da Ucrânia, Web scraping, Análise de redes sociais, Processamento de linguagem natural
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
