The use of Deep Learning methods for Document Understanding has been embraced by the research community in recent years. A requirement for Deep Learning methods and especially Transformer Networks, is access to large datasets. The objective of this thesis was to evaluate a state-of-the-art model for Document Layout Analysis on a public and custom dataset. Additionally, the objective was to build a pipeline for building a dataset specifically for Visually Rich Documents. The research methodology consisted of a literature study to find the state-of-the-art model for Document Layout Analysis and a relevant dataset used to evaluate the chosen model. The literature study also included research on how existing datasets in the domain were collected and processed. Finally, an evaluation framework was created. The evaluation showed that the chosen multi-modal transformer network, LayoutLMv2, performed well on the Docbank dataset. The custom build dataset was limited by class imbalance, although good performance for the larger classes. The annotator tool and its auto-tagging feature performed well and the proposed pipelined showed great promise for creating datasets with Visually Rich Documents. In conclusion, this thesis project answers the research questions and suggests two main opportunities. The first is to encourage others to build datasets with Visually Rich Documents using a similar pipeline to the one presented in this paper. The second is to evaluate the possibility of creating the visual token information for LayoutLMv2 as part of the transformer network rather than using a separate CNN. Användningen av Deep Learning-metoder för dokumentförståelse har anammats av forskarvärlden de senaste åren. Ett krav för Deep Learning-metoder och speciellt Transformer Networks är tillgång till stora datamängder. Syftet med denna avhandling var att utvärdera en state-of-the-art modell för analys av dokumentlayout på en offentligt tillgängligt dataset. Dessutom var målet att bygga en pipeline för att bygga en dataset specifikt för Visuallt Rika Dokument. Forskningsmetodiken bestod av en litteraturstudie för att hitta modellen för Document Layout Analys och ett relevant dataset som användes för att utvärdera den valda modellen. Litteraturstudien omfattade också forskning om hur befintliga dataset i domänen samlades in och bearbetades. Slutligen skapades en utvärderingsram. Utvärderingen visade att det valda multimodala transformatornätverket, LayoutLMv2, fungerade bra på Docbank-datasetet. Den skapade datasetet begränsades av klassobalans även om bra prestanda för de större klasserna erhölls. Annotatorverktyget och dess autotaggningsfunktion fungerade bra och den föreslagna pipelinen visade sig vara mycket lovande för att skapa dataset med VVisuallt Rika Dokument.svis besvarar detta examensarbete forskningsfrågorna och föreslår två huvudsakliga möjligheter. Den första är att uppmuntra andra att bygga datauppsättningar med Visuallt Rika Dokument med en liknande pipeline som den som presenteras i denna uppsats. Det andra är att utvärdera möjligheten att skapa den visuella tokeninformationen för LayoutLMv2 som en del av transformatornätverket snarare än att använda en separat CNN.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______260::f7544e220795dbffc742848c78f0aabf&type=result"></script>');
-->
</script>
Green |
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______260::f7544e220795dbffc742848c78f0aabf&type=result"></script>');
-->
</script>
Uppgifter för behandling av naturliga språk (NLP) har under de senaste åren visat sig vara särskilt effektiva när man använder förtränade språkmodeller som BERT. Det enorma kravet på datorresurser som krävs för att träna sådana modeller gör det dock svårt att använda dem i verkligheten. För att lösa detta problem har komprimeringsmetoder utvecklats. I det här projektet studeras, genomförs och testas några av dessa metoder för komprimering av neurala nätverk för textbearbetning. I vårt fall var den mest effektiva metoden Knowledge Distillation, som består i att överföra kunskap från ett stort neuralt nätverk, som kallas läraren, till ett litet neuralt nätverk, som kallas eleven. Det finns flera varianter av detta tillvägagångssätt, som skiljer sig åt i komplexitet. Vi kommer att titta på två av dem i det här projektet. Den första gör det möjligt att överföra kunskap mellan ett neuralt nätverk och en mindre dubbelriktad LSTM, genom att endast använda resultatet från den större modellen. Och en andra, mer komplex metod som uppmuntrar elevmodellen att också lära sig av lärarmodellens mellanliggande lager för att utvinna kunskap. Det slutliga målet med detta projekt är att ge företagets datavetare färdiga komprimeringsmetoder för framtida projekt som kräver användning av djupa neurala nätverk för NLP. Natural language processing (NLP) tasks have proven to be particularly effective when using pre-trained language models such as BERT. However, the enormous demand on computational resources required to train such models makes their use in the real world difficult. To overcome this problem, compression methods have emerged in recent years. In this project, some of these neural network compression approaches for text processing are studied, implemented and tested. In our case, the most efficient method was Knowledge Distillation, which consists in transmitting knowledge from a large neural network, called teacher, to a small neural network, called student. There are several variants of this approach, which differ in their complexity. We will see two of them in this project, the first one which allows a knowledge transfer between any neural network and another smaller bidirectional LSTM, using only the output of the larger model. And a second, more complex approach that encourages the student model to also learn from the intermediate layers of the teacher model for incremental knowledge extraction. The ultimate goal of this project is to provide the company’s data scientists with ready-to-use compression methods for their future projects requiring the use of deep neural networks for NLP.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______681::69a26608626358792e2c6512ad59ecb4&type=result"></script>');
-->
</script>
Green | |
bronze |
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______681::69a26608626358792e2c6512ad59ecb4&type=result"></script>');
-->
</script>
QC 20160318
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______260::69f048dc488b8bafd20827221e6042d9&type=result"></script>');
-->
</script>
Green |
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______260::69f048dc488b8bafd20827221e6042d9&type=result"></script>');
-->
</script>
Introduction The waterfront of Stockholm, one of Europe's fastest-growing cities, stands at the forefront of climate change challenges. As such, there is a pressing need for innovative solutions and resilient urban design. The SOS Climate Waterfront research project gathered international experts and local representatives, coming from different disciplines to work together in May-June 2022 to discuss, explore proposals and design Sustainable Open Solutions (SOS). This book explores three urban sites in Stockholm, holding significant implications for the city's waterfront— Lövholmen, Frihamnen, and Södra Värtan. During the workshop, SOS Climate Waterfront participants, mainly European researchers, analyzed future challenges, raised new questions, and depicted solutions, which can now contribute to cross-country comparisons in a larger EU-framework. The three sites are not only driven by the demand for more housing but also face crucial issues related to cultural heritage, climate change, landscape ecology, and social development. Achieving a delicate balance between these aspects and economic interests presents a significant task for the city. The waterfront of Stockholm holds substantial relevance in the context of climate change and its impact on coastal areas. Thus, analysis of the Swedish context, based on data collected and on-site knowledge sustains a deeper understanding of the challenges and opportunities that lie ahead. Stockholm is expected to be affected by the impacts of climate change, including temperature increases, changing precipitation patterns, and the potential for more frequent cloudbursts. While the rising sea level is a long-term challenge rather than an immediate concern, increasing risks of extreme weather events and flooding were taken in consideration. Stockholm rests on two different bodies of water, at a location where the Baltic Sea (Östersjön in Swedish) with brackish water meets Lake Mälaren, which is an important provider of freshwater for the larger Stockholm area. As the lyrics of a popular contemporary Swedish song (by Robert Broberg) describe it: “the city is full of water”. However, to ensure that the ecological and chemical status will be maintained, in facing future challenges in terms of urbanisation and climate change, much attention has been paid to ensure the preservation of the water quality of the Mälaren Lake, a vital water source for two million people. The city values its water and continuously invests in improving the situation (e.g. the new sluice at Slussen). The activities carried out in the SOS Climate Waterfront workshop in Stockholm integrated this relationship to water as well as the continuing land-rise, the balance of which adds complexity to the sea level modelling and therefore also to the anticipations and scenarios for the future. In this book, the authors explore innovative strategies and design proposals to tackle these challenges while preserving the cultural identity and heritage value of the sites. Researchers from various European cities, supported by experts and academic lectures, analyze extensive input materials and information, ranging from planning documents and historical records to consultation reports and city visions. By drawing upon multidisciplinary backgrounds and experiences, the researchers identify the socioeconomic and environmental qualities of each site, ultimately developing site design concepts and solutions that address climate change challenges, the maintenance of cultural identities, and the protection of biodiversity. Throughout the book, the proposed designs emphasize the importance of finding a balance between preserving cultural heritage, the values of local communities, the stimulating economic growth, and promotion of sustainable urban development. Key elements include the reuse of existing infrastructure, the integration of green-blue schemes, the improvement of biodiversity, and the creation of vibrant and multi-functional neighbourhoods that connect people to each other and their surroundings. While design solutions present promising approaches, their implementation and the institutional challenges that may arise in specific city contexts remain external to the results presented here. The book acknowledges the need for further research and highlights the shared recognition among the workshop participants regarding the gaps and blind spots in their findings. The following chapters of the book delve into climate change in Sweden, the role of culture and arts in the environmental movement, and specific case studies and design proposals for each site. By exploring these diverse perspectives, this book aims to contribute to the ongoing discourse on sustainable urban design and planning, to inspire innovative approaches in addressing complex challenges faced by Stockholm in the future. PART 1 of the book offers a comprehensive understanding of climate change in Sweden, street fishing in Stockholm, and the role of culture and arts in the environmental movement in the Nordic Region and internationally. Furthermore, the lessons from Stockholm and its surroundings in this report draw on presentations, by professionals and researchers from various fields, made during the workshop. Some of these lessons have been written into interesting articles, introduced below. The chapter “Climate change in Sweden” by Magnus Joelsson from the Swedish Meteorological and Hydrological Institute (SMHI) provides an updated analysis with data and the context for discussing climate change in Sweden. The text makes the distinction between weather and climate, referring to the expression “Climate is what you expect, weather is what you get” that Mark Twain is said to have coined. Moreover, calling for actions by emphasising that the trend of climate change is expected to continue, both globally and in Sweden. What will happen in the far future still depends on our actions, now and in the future. The contribution entitled “Urban nature does not stop at the waterfront, neither should urban planning, a case study of street fishing in Stockholm” raises questions about how planning and strategies for waterfront areas in cities should consider more perspectives from a wider group of interests. It discusses how urban dwellers live with water, with a focus on recreational fishing and what this use entails. The authors (Anja Moum Rieser, from KTH Royal Institute of Technology, Wieben Johannes Boonstra and Rikard Hedling, both from Uppsala University) go beyond the human-centric view and expand the gaze to other species’ needs and also incorporating the body of water in planning for the urban waterfront areas. The chapter “The role of culture and arts in the environmental movement in the Nordic Region and internationally” by Elisavet Papageorgiou and Iwona Preis from Intercult, discusses artistic perspectives on sustainability and climate change. This focuses on how art and culture can raise awareness, provide inspiring actions, and promote social cohesion around sustainable practices. Drawing on experiences from projects aiming to invite and engage community dialogues, they argue that artistic strategies can challenge dominant narratives and promote alternative visions for a sustainable future. The contribution “Sense the Marsh” by Thelma Dethelfsen from KTH The Royal Institute of Technology, emphasises the importance of architecture and landscape design in creating adaptive and resilient strategies to manage flooding and sea level rise. The study focuses on how designs can encourage interaction and awareness with the surroundings. Thereby highlighting the interfaces between humans and nature and raising questions about how flooding can be used as a quality and catalyst to attract more people to an area. The resulting design provides an opportunity to experience nature though the design and architectural solutions, situated on the border between human, non-human species and nature. In PART 2, readers will explore the detailed design proposals developed by different groups for the urban sites in focus. These proposals aim to intertwine sustainability, cultural identity, and economic interests, offering insights into the potential for resilient and vibrant urban spaces. By assessing existing conditions on three sites analysed in Stockholm, including Lövholmen, Frihamnen, and Södra Värtan, the teams participating in the workshop actively contributed to the analysis of the sites and development of design solutions for the areas, in the end forming strategies for better preparedness for future challenges and better lives for the inhabitants. Lövholmen is located in the north-western part of Liljeholmen, one of the major developmental centres in Stockholm. The area is currently a closed-off industrial site, but the municipality’s intention is to redevelop it into a mixed urban space with homes, workplaces, shops, schools, and more. It's expected that 1500 new homes will be built in the area. Many of the current industrial buildings are empty and in bad shape. While some of these will be replaced with housing, other industrial buildings have heritage value and should be protected during the development, after which a new use should be found for them. Frihamnen is, together with the Södra Värtan project, part of the larger development of ”Norra Djurgårdsstaden”, the Stockholm Royal Seaport. Frihamnen is located to the south of Värtahamnen and is in turn strongly connected to Loudden in the south. The municipality plans for the area to contain approximately 1700 homes, 4000 workplaces and 75,000 m2 of retail and office space. Some of the existing businesses in Frihamnen will remain, but much of the existing infrastructure is planned to be removed. The harbour no longer handles freight shipping, but passenger ships will continue to depart from the harbour (Frihamnspiren). Södra Värtan is planned to contain 1500 apartments, 20 preschool departments, 155,000 m2 of office and retail space, as well as 10,000 m2 of parks and a 600 m long waterfront walkway. The new development is intended to co-exist with the activities in the harbour, which creates challenges such as the blocking of noise stemming from the cruise ships. The walkways along the waterfront are planned to have shops and restaurants. The contributions of the articles, together with the SOS Climate Waterfront teams’ analysis of the three sites in Stockholm, provides relevant and timely interdisciplinary efforts to co-create novel solutions and future strategies to manage the climate challenges ahead. The solutions relate to the history of the urban territory, actors involved (or those excluded) and changes, over time, of planning ideals. A key theme is how to plan by creating inclusive strategies for the future by involving representatives of diverse interests, competences, and future visions for the sites. The consequences of climate change are affecting these different stakeholders and citizens in a wide range of ways, so including them in the process is crucial. This also includes the inclusion of future generations’ views on urban transformation. The largest challenge is to create new, novel solutions where these human interests, as well as those of local nature and non-human species, can be incorporated, in an effort to plan and design for a mitigation and management of the consequences of climate change. As we embark on this journey of exploration and innovation, we invite readers to delve into the pages of this book, where interdisciplinary research, creative design, and a shared commitment to sustainable urban development and decarbonisation strategies converge. Together, let us envision a future where cities thrive, harmoniously balancing their heritage, environment, and economic aspirations. QC 20231115 SOS Climate Waterfront https://cordis.europa.eu/project/id/823901
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______260::923b2b74193fbdaf1d7ed9fdc9c0c91d&type=result"></script>');
-->
</script>
Green |
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______260::923b2b74193fbdaf1d7ed9fdc9c0c91d&type=result"></script>');
-->
</script>
Part of book: ISBN 978-1-009-10023-6QC 20221219
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.1017/9781009110044.003&type=result"></script>');
-->
</script>
Green |
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.1017/9781009110044.003&type=result"></script>');
-->
</script>
QC 20211207
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______681::1d4da567c4005b3b1738f3433a926dcb&type=result"></script>');
-->
</script>
Green | |
bronze |
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______681::1d4da567c4005b3b1738f3433a926dcb&type=result"></script>');
-->
</script>
Företag existerar inte som isolerade organisationer. De är inbäddade i strukturella relationer med varandra. Att kartlägga ett visst företags relationer med andra företag när det gäller konkurrenter, dotterbolag, leverantörer och kunder är nyckeln till att förstå företagets huvudsakliga riskfaktorer och möjligheter. Det konventionella sättet att hålla sig uppdaterad med denna viktiga kunskap var genom att läsa ekonomiska nyheter och rapporter från högkvalificerad manuell arbetskraft som till exempel en finansanalytiker. Men med utvecklingen av ”Natural Language Processing” (NLP) och grafdatabaser är det nu möjligt att systematiskt extrahera och lagra strukturerad information från ostrukturerade datakällor. Den nuvarande metoden för att effektivt extrahera information använder övervakade maskininlärningsmodeller som kräver en stor mängd märkta träningsdata. Datamärkningsprocessen är vanligtvis tidskrävande och svår att få i ett domänspecifikt område. Detta projekt utforskar ett tillvägagångssätt för att konstruera en företagsdomänspecifikt ”Knowledge Graph” (KG) som innehåller företagsrelaterade enheter och relationer från SEC 10-K-arkivering genom att kombinera en i förväg tränad allmän NLP med regelbaserade mönster i ”Named Entity Recognition” (NER) och ”Relation Extraction” (RE). Detta tillvägagångssätt eliminerar den tidskrävande datamärkningsuppgiften i det statistiska tillvägagångssättet och genom att utvärdera tio SEC 10-K arkiv har modellen den totala återkallelsen på 53,6 %, precision på 75,7 % och F1-poängen på 62,8 %. Resultatet visar att det är möjligt att extrahera företagsinformation med hybridmetoderna, vilket inte kräver en stor mängd märkta träningsdata. Projektet kräver dock en tidskrävande process för att hitta lexikala mönster från meningar för att extrahera företagsrelaterade enheter och relationer. Companies do not exist in isolation. They are embedded in structural relationships with each other. Mapping a given company’s relationships with other companies in terms of competitors, subsidiaries, suppliers, and customers are key to understanding a company’s major risk factors and opportunities. Conventionally, obtaining and staying up to date with this key knowledge was achieved by reading financial news and reports by highly skilled manual labor like a financial analyst. However, with the development of Natural Language Processing (NLP) and graph databases, it is now possible to systematically extract and store structured information from unstructured data sources. The current go-to method to effectively extract information uses supervised machine learning models, which require a large amount of labeled training data. The data labeling process is usually time-consuming and hard to get in a domain-specific area. This project explores an approach to construct a company domain-specific Knowledge Graph (KG) that contains company-related entities and relationships from the U.S. Securities and Exchange Commission (SEC) 10-K filings by combining a pre-trained general NLP with rule-based patterns in Named Entity Recognition (NER) and Relation Extraction (RE). This approach eliminates the time-consuming data-labeling task in the statistical approach, and by evaluating ten 10-k filings, the model has the overall Recall of 53.6%, Precision of 75.7%, and the F1-score of 62.8%. The result shows it is possible to extract company information using the hybrid methods, which does not require a large amount of labeled training data. However, the project requires the time-consuming process of finding lexical patterns from sentences to extract company-related entities and relationships.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______260::65088c9dcf50bb4de2e221bcdea69374&type=result"></script>');
-->
</script>
Green | |
bronze |
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______260::65088c9dcf50bb4de2e221bcdea69374&type=result"></script>');
-->
</script>
This master thesis deals with automatic summarization of text and how semantic folding can be used as a similarity measure between sentences in the TextRank algorithm. The method was implemented and compared with two common similarity measures. These two similarity measures were cosine similarity of tf-idf vectors and the number of overlapping terms in two sentences. The three methods were implemented and the linguistic features used in the construction were stop words, part-of-speech filtering and stemming. Five different part-of-speech filters were used, with different mixtures of nouns, verbs, and adjectives. The three methods were evaluated by summarizing documents from the Document Understanding Conference and comparing them to gold-standard summarization created by human judges. Comparison between the system summaries and gold-standard summaries was made with the ROUGE-1 measure. The algorithm with semantic folding performed worst of the three methods, but only 0.0096 worse in F-score than cosine similarity of tf-idf vectors that performed best. For semantic folding, the average precision was 46.2% and recall 45.7% for the best-performing part-of-speech filter. Det här examensarbetet behandlar automatisk textsammanfattning och hur semantisk vikning kan användas som likhetsmått mellan meningar i algoritmen TextRank. Metoden implementerades och jämfördes med två vanliga likhetsmått. Dessa två likhetsmått var cosinus-likhet mellan tf-idf-vektorer samt antal överlappande termer i två meningar. De tre metoderna implementerades och de lingvistiska särdragen som användes vid konstruktionen var stoppord, filtrering av ordklasser samt en avstämmare. Fem olika filter för ordklasser användes, med olika blandningar av substantiv, verb och adjektiv. De tre metoderna utvärderades genom att sammanfatta dokument från DUC och jämföra dessa mot guldsammanfattningar skapade av mänskliga domare. Jämförelse mellan systemsammanfattningar och guldsammanfattningar gjordes med måttet ROUGE-1. Algoritmen med semantisk vikning presterade sämst av de tre jämförda metoderna, dock bara 0.0096 sämre i F-score än cosinus-likhet mellan tf-idf-vektorer som presterade bäst. För semantisk vikning var den genomsnittliga precisionen 46.2% och recall 45.7% för det ordklassfiltret som presterade bäst.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______260::4afe65a17d2fbac592333b4a9e71731a&type=result"></script>');
-->
</script>
Green | |
bronze |
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______260::4afe65a17d2fbac592333b4a9e71731a&type=result"></script>');
-->
</script>
Systematic review of research manuscripts is a common procedure in which research studies pertaining a particular field or domain are classified and structured in a methodological way. This process involves, between other steps, an extensive review and consolidation of scientific metrics and attributes of the manuscripts, such as citations, type or venue of publication. The extraction and mapping of relevant publication data, evidently, is a very laborious task if performed manually. Automation of such systematic mapping steps intend to reduce the human effort required and therefore can potentially reduce the time required for this process.The objective of this thesis is to automate the data extraction and mapping steps when systematically reviewing studies. The manual process is replaced by novel graph modelling techniques for effective knowledge representation, as well as novel machine learning techniques that aim to learn these representations. This eventually automates this process by characterising the publications on the basis of certain sub-properties and qualities that give the reviewer a quick high-level overview of each research study. The final model is a concept learner that predicts these sub-properties which in addition addresses the inherent concept-drift of novel manuscripts over time. Different models were developed and explored in this research study for the development of concept learner.Results show that: (1) Graph reasoning techniques which leverage the expressive power in modern graph databases are very effective in capturing the extracted knowledge in a so-called knowledge graph, which allows us to form concepts that can be learned using standard machine learning techniques like logistic regression, decision trees and neural networks etc. (2) Neural network models and ensemble models outperformed other standard machine learning techniques like logistic regression and decision trees based on the evaluation metrics. (3) The concept learner is able to detect and avoid concept drift by retraining the model. Systematisk granskning av forskningsmanuskript är en vanlig procedur där forskningsstudier inom ett visst område klassificeras och struktureras på ett metodologiskt sätt. Denna process innefattar en omfattande granskning och sammanförande av vetenskapliga mätvärden och attribut för manuskriptet, såsom citat, typ av manuskript eller publiceringsplats. Framställning och kartläggning av relevant publikationsdata är uppenbarligen en mycket mödosam uppgift om den utförs manuellt. Avsikten med automatiseringen av processen för denna typ av systematisk kartläggning är att minska den mänskliga ansträngningen, och den tid som krävs kan på så sätt minskas. Syftet med denna avhandling är att automatisera datautvinning och stegen för kartläggning vid systematisk granskning av studier. Den manuella processen ersätts av avancerade grafmodelleringstekniker för effektiv kunskapsrepresentation, liksom avancerade maskininlärningstekniker som syftar till att lära maskinen dessa representationer. Detta automatiserar så småningom denna process genom att karakterisera publikationerna beserat på vissa subjektiva egenskaper och kvaliter som ger granskaren en snabb god översikt över varje forskningsstudie. Den slutliga modellen är ett inlärningskoncept som förutsäger dessa subjektiva egenskaper och dessutom behandlar den inneboende konceptuella driften i manuskriptet över tiden. Olika modeller utvecklades och undersöktes i denna forskningsstudie för utvecklingen av inlärningskonceptet. Resultaten visar att: (1) Diagrammatiskt resonerande som uttnytjar moderna grafdatabaser är mycket effektiva för att fånga den framställda kunskapen i en så kallad kunskapsgraf, och gör det möjligt att vidareutveckla koncept som kan läras med hjälp av standard tekniker för maskininlärning. (2) Neurala nätverksmodeller och ensemblemodeller överträffade andra standard maskininlärningstekniker baserat på utvärderingsvärdena. (3) Inlärningskonceptet kan detektera och undvika konceptuell drift baserat på F1-poäng och omlärning av algoritmen.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______681::29d51b8d9839c2b641a3953b48aa8057&type=result"></script>');
-->
</script>
Green | |
bronze |
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______681::29d51b8d9839c2b641a3953b48aa8057&type=result"></script>');
-->
</script>
A dataset consisting of logs describing results of tests from a single Build and Test process, used in a Continous Integration setting, is utilized to automate categorization of the logs according to failure types. Two different features are evaluated, words and log keys, using unordered document matrices as document representations to determine the viability of log keys. The experiment uses Multinomial Naive Bayes, MNB, classifiers and multi-class Support Vector Machines, SVM, to establish the performance of the different features. The experiment indicates that log keys are equivalent to using words whilst achieving a great reduction in dictionary size. Three different multi-layer perceptrons are evaluated on the log key document matrices achieving slightly higher cross-validation accuracies than the SVM. A shallow-and-wide Convolutional Neural Network, CNN, is then designed using temporal sequences of log keys as document representations. The top performing model of each model architecture is evaluated on a test set except for the MNB classifiers as the MNB had subpar performance during cross-validation. The test set evaluation indicates that the CNN is superior to the other models. Ett dataset som består av loggar som beskriver resultat av test från en bygg- och testprocess, använt i en miljö med kontinuerlig integration, används för att automatiskt kategorisera loggar enligt olika feltyper. Två olika sorters indata evalueras, ord och loggnycklar, där icke- ordnade dokumentmatriser används som dokumentrepresentationer för att avgöra loggnycklars användbarhet. Experimentet använder multinomial naiv bayes, MNB, som klassificerare och multiklass-supportvektormaskiner, SVM, för att avgöra prestandan för de olika sorternas indata. Experimentet indikerar att loggnycklar är ekvivalenta med ord medan loggnycklar har mycket mindre ordboksstorlek. Tre olika multi-lager-perceptroner evalueras på loggnyckel-dokumentmatriser och får något högre exakthet i krossvalideringen jämfört med SVM. Ett grunt-och-brett faltningsnätverk, CNN, designas med tidsmässiga sekvenser av loggnycklar som dokumentrepresentationer. De topppresterande modellerna av varje modellarkitektur evalueras på ett testset, utom för MNB-klassificerarna då MNB har dålig prestanda under krossvalidering. Evalueringen av testsetet indikerar att CNN:en är bättre än de andra modellerna.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______681::4b6adc6c535aeb8c0dc1d6c0c6d464e2&type=result"></script>');
-->
</script>
Green | |
bronze |
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______681::4b6adc6c535aeb8c0dc1d6c0c6d464e2&type=result"></script>');
-->
</script>
The use of Deep Learning methods for Document Understanding has been embraced by the research community in recent years. A requirement for Deep Learning methods and especially Transformer Networks, is access to large datasets. The objective of this thesis was to evaluate a state-of-the-art model for Document Layout Analysis on a public and custom dataset. Additionally, the objective was to build a pipeline for building a dataset specifically for Visually Rich Documents. The research methodology consisted of a literature study to find the state-of-the-art model for Document Layout Analysis and a relevant dataset used to evaluate the chosen model. The literature study also included research on how existing datasets in the domain were collected and processed. Finally, an evaluation framework was created. The evaluation showed that the chosen multi-modal transformer network, LayoutLMv2, performed well on the Docbank dataset. The custom build dataset was limited by class imbalance, although good performance for the larger classes. The annotator tool and its auto-tagging feature performed well and the proposed pipelined showed great promise for creating datasets with Visually Rich Documents. In conclusion, this thesis project answers the research questions and suggests two main opportunities. The first is to encourage others to build datasets with Visually Rich Documents using a similar pipeline to the one presented in this paper. The second is to evaluate the possibility of creating the visual token information for LayoutLMv2 as part of the transformer network rather than using a separate CNN. Användningen av Deep Learning-metoder för dokumentförståelse har anammats av forskarvärlden de senaste åren. Ett krav för Deep Learning-metoder och speciellt Transformer Networks är tillgång till stora datamängder. Syftet med denna avhandling var att utvärdera en state-of-the-art modell för analys av dokumentlayout på en offentligt tillgängligt dataset. Dessutom var målet att bygga en pipeline för att bygga en dataset specifikt för Visuallt Rika Dokument. Forskningsmetodiken bestod av en litteraturstudie för att hitta modellen för Document Layout Analys och ett relevant dataset som användes för att utvärdera den valda modellen. Litteraturstudien omfattade också forskning om hur befintliga dataset i domänen samlades in och bearbetades. Slutligen skapades en utvärderingsram. Utvärderingen visade att det valda multimodala transformatornätverket, LayoutLMv2, fungerade bra på Docbank-datasetet. Den skapade datasetet begränsades av klassobalans även om bra prestanda för de större klasserna erhölls. Annotatorverktyget och dess autotaggningsfunktion fungerade bra och den föreslagna pipelinen visade sig vara mycket lovande för att skapa dataset med VVisuallt Rika Dokument.svis besvarar detta examensarbete forskningsfrågorna och föreslår två huvudsakliga möjligheter. Den första är att uppmuntra andra att bygga datauppsättningar med Visuallt Rika Dokument med en liknande pipeline som den som presenteras i denna uppsats. Det andra är att utvärdera möjligheten att skapa den visuella tokeninformationen för LayoutLMv2 som en del av transformatornätverket snarare än att använda en separat CNN.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______260::f7544e220795dbffc742848c78f0aabf&type=result"></script>');
-->
</script>
Green |
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______260::f7544e220795dbffc742848c78f0aabf&type=result"></script>');
-->
</script>
Uppgifter för behandling av naturliga språk (NLP) har under de senaste åren visat sig vara särskilt effektiva när man använder förtränade språkmodeller som BERT. Det enorma kravet på datorresurser som krävs för att träna sådana modeller gör det dock svårt att använda dem i verkligheten. För att lösa detta problem har komprimeringsmetoder utvecklats. I det här projektet studeras, genomförs och testas några av dessa metoder för komprimering av neurala nätverk för textbearbetning. I vårt fall var den mest effektiva metoden Knowledge Distillation, som består i att överföra kunskap från ett stort neuralt nätverk, som kallas läraren, till ett litet neuralt nätverk, som kallas eleven. Det finns flera varianter av detta tillvägagångssätt, som skiljer sig åt i komplexitet. Vi kommer att titta på två av dem i det här projektet. Den första gör det möjligt att överföra kunskap mellan ett neuralt nätverk och en mindre dubbelriktad LSTM, genom att endast använda resultatet från den större modellen. Och en andra, mer komplex metod som uppmuntrar elevmodellen att också lära sig av lärarmodellens mellanliggande lager för att utvinna kunskap. Det slutliga målet med detta projekt är att ge företagets datavetare färdiga komprimeringsmetoder för framtida projekt som kräver användning av djupa neurala nätverk för NLP. Natural language processing (NLP) tasks have proven to be particularly effective when using pre-trained language models such as BERT. However, the enormous demand on computational resources required to train such models makes their use in the real world difficult. To overcome this problem, compression methods have emerged in recent years. In this project, some of these neural network compression approaches for text processing are studied, implemented and tested. In our case, the most efficient method was Knowledge Distillation, which consists in transmitting knowledge from a large neural network, called teacher, to a small neural network, called student. There are several variants of this approach, which differ in their complexity. We will see two of them in this project, the first one which allows a knowledge transfer between any neural network and another smaller bidirectional LSTM, using only the output of the larger model. And a second, more complex approach that encourages the student model to also learn from the intermediate layers of the teacher model for incremental knowledge extraction. The ultimate goal of this project is to provide the company’s data scientists with ready-to-use compression methods for their future projects requiring the use of deep neural networks for NLP.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______681::69a26608626358792e2c6512ad59ecb4&type=result"></script>');
-->
</script>
Green | |
bronze |
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______681::69a26608626358792e2c6512ad59ecb4&type=result"></script>');
-->
</script>
QC 20160318
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______260::69f048dc488b8bafd20827221e6042d9&type=result"></script>');
-->
</script>
Green |
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______260::69f048dc488b8bafd20827221e6042d9&type=result"></script>');
-->
</script>
Introduction The waterfront of Stockholm, one of Europe's fastest-growing cities, stands at the forefront of climate change challenges. As such, there is a pressing need for innovative solutions and resilient urban design. The SOS Climate Waterfront research project gathered international experts and local representatives, coming from different disciplines to work together in May-June 2022 to discuss, explore proposals and design Sustainable Open Solutions (SOS). This book explores three urban sites in Stockholm, holding significant implications for the city's waterfront— Lövholmen, Frihamnen, and Södra Värtan. During the workshop, SOS Climate Waterfront participants, mainly European researchers, analyzed future challenges, raised new questions, and depicted solutions, which can now contribute to cross-country comparisons in a larger EU-framework. The three sites are not only driven by the demand for more housing but also face crucial issues related to cultural heritage, climate change, landscape ecology, and social development. Achieving a delicate balance between these aspects and economic interests presents a significant task for the city. The waterfront of Stockholm holds substantial relevance in the context of climate change and its impact on coastal areas. Thus, analysis of the Swedish context, based on data collected and on-site knowledge sustains a deeper understanding of the challenges and opportunities that lie ahead. Stockholm is expected to be affected by the impacts of climate change, including temperature increases, changing precipitation patterns, and the potential for more frequent cloudbursts. While the rising sea level is a long-term challenge rather than an immediate concern, increasing risks of extreme weather events and flooding were taken in consideration. Stockholm rests on two different bodies of water, at a location where the Baltic Sea (Östersjön in Swedish) with brackish water meets Lake Mälaren, which is an important provider of freshwater for the larger Stockholm area. As the lyrics of a popular contemporary Swedish song (by Robert Broberg) describe it: “the city is full of water”. However, to ensure that the ecological and chemical status will be maintained, in facing future challenges in terms of urbanisation and climate change, much attention has been paid to ensure the preservation of the water quality of the Mälaren Lake, a vital water source for two million people. The city values its water and continuously invests in improving the situation (e.g. the new sluice at Slussen). The activities carried out in the SOS Climate Waterfront workshop in Stockholm integrated this relationship to water as well as the continuing land-rise, the balance of which adds complexity to the sea level modelling and therefore also to the anticipations and scenarios for the future. In this book, the authors explore innovative strategies and design proposals to tackle these challenges while preserving the cultural identity and heritage value of the sites. Researchers from various European cities, supported by experts and academic lectures, analyze extensive input materials and information, ranging from planning documents and historical records to consultation reports and city visions. By drawing upon multidisciplinary backgrounds and experiences, the researchers identify the socioeconomic and environmental qualities of each site, ultimately developing site design concepts and solutions that address climate change challenges, the maintenance of cultural identities, and the protection of biodiversity. Throughout the book, the proposed designs emphasize the importance of finding a balance between preserving cultural heritage, the values of local communities, the stimulating economic growth, and promotion of sustainable urban development. Key elements include the reuse of existing infrastructure, the integration of green-blue schemes, the improvement of biodiversity, and the creation of vibrant and multi-functional neighbourhoods that connect people to each other and their surroundings. While design solutions present promising approaches, their implementation and the institutional challenges that may arise in specific city contexts remain external to the results presented here. The book acknowledges the need for further research and highlights the shared recognition among the workshop participants regarding the gaps and blind spots in their findings. The following chapters of the book delve into climate change in Sweden, the role of culture and arts in the environmental movement, and specific case studies and design proposals for each site. By exploring these diverse perspectives, this book aims to contribute to the ongoing discourse on sustainable urban design and planning, to inspire innovative approaches in addressing complex challenges faced by Stockholm in the future. PART 1 of the book offers a comprehensive understanding of climate change in Sweden, street fishing in Stockholm, and the role of culture and arts in the environmental movement in the Nordic Region and internationally. Furthermore, the lessons from Stockholm and its surroundings in this report draw on presentations, by professionals and researchers from various fields, made during the workshop. Some of these lessons have been written into interesting articles, introduced below. The chapter “Climate change in Sweden” by Magnus Joelsson from the Swedish Meteorological and Hydrological Institute (SMHI) provides an updated analysis with data and the context for discussing climate change in Sweden. The text makes the distinction between weather and climate, referring to the expression “Climate is what you expect, weather is what you get” that Mark Twain is said to have coined. Moreover, calling for actions by emphasising that the trend of climate change is expected to continue, both globally and in Sweden. What will happen in the far future still depends on our actions, now and in the future. The contribution entitled “Urban nature does not stop at the waterfront, neither should urban planning, a case study of street fishing in Stockholm” raises questions about how planning and strategies for waterfront areas in cities should consider more perspectives from a wider group of interests. It discusses how urban dwellers live with water, with a focus on recreational fishing and what this use entails. The authors (Anja Moum Rieser, from KTH Royal Institute of Technology, Wieben Johannes Boonstra and Rikard Hedling, both from Uppsala University) go beyond the human-centric view and expand the gaze to other species’ needs and also incorporating the body of water in planning for the urban waterfront areas. The chapter “The role of culture and arts in the environmental movement in the Nordic Region and internationally” by Elisavet Papageorgiou and Iwona Preis from Intercult, discusses artistic perspectives on sustainability and climate change. This focuses on how art and culture can raise awareness, provide inspiring actions, and promote social cohesion around sustainable practices. Drawing on experiences from projects aiming to invite and engage community dialogues, they argue that artistic strategies can challenge dominant narratives and promote alternative visions for a sustainable future. The contribution “Sense the Marsh” by Thelma Dethelfsen from KTH The Royal Institute of Technology, emphasises the importance of architecture and landscape design in creating adaptive and resilient strategies to manage flooding and sea level rise. The study focuses on how designs can encourage interaction and awareness with the surroundings. Thereby highlighting the interfaces between humans and nature and raising questions about how flooding can be used as a quality and catalyst to attract more people to an area. The resulting design provides an opportunity to experience nature though the design and architectural solutions, situated on the border between human, non-human species and nature. In PART 2, readers will explore the detailed design proposals developed by different groups for the urban sites in focus. These proposals aim to intertwine sustainability, cultural identity, and economic interests, offering insights into the potential for resilient and vibrant urban spaces. By assessing existing conditions on three sites analysed in Stockholm, including Lövholmen, Frihamnen, and Södra Värtan, the teams participating in the workshop actively contributed to the analysis of the sites and development of design solutions for the areas, in the end forming strategies for better preparedness for future challenges and better lives for the inhabitants. Lövholmen is located in the north-western part of Liljeholmen, one of the major developmental centres in Stockholm. The area is currently a closed-off industrial site, but the municipality’s intention is to redevelop it into a mixed urban space with homes, workplaces, shops, schools, and more. It's expected that 1500 new homes will be built in the area. Many of the current industrial buildings are empty and in bad shape. While some of these will be replaced with housing, other industrial buildings have heritage value and should be protected during the development, after which a new use should be found for them. Frihamnen is, together with the Södra Värtan project, part of the larger development of ”Norra Djurgårdsstaden”, the Stockholm Royal Seaport. Frihamnen is located to the south of Värtahamnen and is in turn strongly connected to Loudden in the south. The municipality plans for the area to contain approximately 1700 homes, 4000 workplaces and 75,000 m2 of retail and office space. Some of the existing businesses in Frihamnen will remain, but much of the existing infrastructure is planned to be removed. The harbour no longer handles freight shipping, but passenger ships will continue to depart from the harbour (Frihamnspiren). Södra Värtan is planned to contain 1500 apartments, 20 preschool departments, 155,000 m2 of office and retail space, as well as 10,000 m2 of parks and a 600 m long waterfront walkway. The new development is intended to co-exist with the activities in the harbour, which creates challenges such as the blocking of noise stemming from the cruise ships. The walkways along the waterfront are planned to have shops and restaurants. The contributions of the articles, together with the SOS Climate Waterfront teams’ analysis of the three sites in Stockholm, provides relevant and timely interdisciplinary efforts to co-create novel solutions and future strategies to manage the climate challenges ahead. The solutions relate to the history of the urban territory, actors involved (or those excluded) and changes, over time, of planning ideals. A key theme is how to plan by creating inclusive strategies for the future by involving representatives of diverse interests, competences, and future visions for the sites. The consequences of climate change are affecting these different stakeholders and citizens in a wide range of ways, so including them in the process is crucial. This also includes the inclusion of future generations’ views on urban transformation. The largest challenge is to create new, novel solutions where these human interests, as well as those of local nature and non-human species, can be incorporated, in an effort to plan and design for a mitigation and management of the consequences of climate change. As we embark on this journey of exploration and innovation, we invite readers to delve into the pages of this book, where interdisciplinary research, creative design, and a shared commitment to sustainable urban development and decarbonisation strategies converge. Together, let us envision a future where cities thrive, harmoniously balancing their heritage, environment, and economic aspirations. QC 20231115 SOS Climate Waterfront https://cordis.europa.eu/project/id/823901
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______260::923b2b74193fbdaf1d7ed9fdc9c0c91d&type=result"></script>');
-->
</script>
Green |
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______260::923b2b74193fbdaf1d7ed9fdc9c0c91d&type=result"></script>');
-->
</script>
Part of book: ISBN 978-1-009-10023-6QC 20221219
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.1017/9781009110044.003&type=result"></script>');
-->
</script>
Green |
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.1017/9781009110044.003&type=result"></script>');
-->
</script>
QC 20211207
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______681::1d4da567c4005b3b1738f3433a926dcb&type=result"></script>');
-->
</script>
Green | |
bronze |
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______681::1d4da567c4005b3b1738f3433a926dcb&type=result"></script>');
-->
</script>
Företag existerar inte som isolerade organisationer. De är inbäddade i strukturella relationer med varandra. Att kartlägga ett visst företags relationer med andra företag när det gäller konkurrenter, dotterbolag, leverantörer och kunder är nyckeln till att förstå företagets huvudsakliga riskfaktorer och möjligheter. Det konventionella sättet att hålla sig uppdaterad med denna viktiga kunskap var genom att läsa ekonomiska nyheter och rapporter från högkvalificerad manuell arbetskraft som till exempel en finansanalytiker. Men med utvecklingen av ”Natural Language Processing” (NLP) och grafdatabaser är det nu möjligt att systematiskt extrahera och lagra strukturerad information från ostrukturerade datakällor. Den nuvarande metoden för att effektivt extrahera information använder övervakade maskininlärningsmodeller som kräver en stor mängd märkta träningsdata. Datamärkningsprocessen är vanligtvis tidskrävande och svår att få i ett domänspecifikt område. Detta projekt utforskar ett tillvägagångssätt för att konstruera en företagsdomänspecifikt ”Knowledge Graph” (KG) som innehåller företagsrelaterade enheter och relationer från SEC 10-K-arkivering genom att kombinera en i förväg tränad allmän NLP med regelbaserade mönster i ”Named Entity Recognition” (NER) och ”Relation Extraction” (RE). Detta tillvägagångssätt eliminerar den tidskrävande datamärkningsuppgiften i det statistiska tillvägagångssättet och genom att utvärdera tio SEC 10-K arkiv har modellen den totala återkallelsen på 53,6 %, precision på 75,7 % och F1-poängen på 62,8 %. Resultatet visar att det är möjligt att extrahera företagsinformation med hybridmetoderna, vilket inte kräver en stor mängd märkta träningsdata. Projektet kräver dock en tidskrävande process för att hitta lexikala mönster från meningar för att extrahera företagsrelaterade enheter och relationer. Companies do not exist in isolation. They are embedded in structural relationships with each other. Mapping a given company’s relationships with other companies in terms of competitors, subsidiaries, suppliers, and customers are key to understanding a company’s major risk factors and opportunities. Conventionally, obtaining and staying up to date with this key knowledge was achieved by reading financial news and reports by highly skilled manual labor like a financial analyst. However, with the development of Natural Language Processing (NLP) and graph databases, it is now possible to systematically extract and store structured information from unstructured data sources. The current go-to method to effectively extract information uses supervised machine learning models, which require a large amount of labeled training data. The data labeling process is usually time-consuming and hard to get in a domain-specific area. This project explores an approach to construct a company domain-specific Knowledge Graph (KG) that contains company-related entities and relationships from the U.S. Securities and Exchange Commission (SEC) 10-K filings by combining a pre-trained general NLP with rule-based patterns in Named Entity Recognition (NER) and Relation Extraction (RE). This approach eliminates the time-consuming data-labeling task in the statistical approach, and by evaluating ten 10-k filings, the model has the overall Recall of 53.6%, Precision of 75.7%, and the F1-score of 62.8%. The result shows it is possible to extract company information using the hybrid methods, which does not require a large amount of labeled training data. However, the project requires the time-consuming process of finding lexical patterns from sentences to extract company-related entities and relationships.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______260::65088c9dcf50bb4de2e221bcdea69374&type=result"></script>');
-->
</script>
Green | |
bronze |
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______260::65088c9dcf50bb4de2e221bcdea69374&type=result"></script>');
-->
</script>
This master thesis deals with automatic summarization of text and how semantic folding can be used as a similarity measure between sentences in the TextRank algorithm. The method was implemented and compared with two common similarity measures. These two similarity measures were cosine similarity of tf-idf vectors and the number of overlapping terms in two sentences. The three methods were implemented and the linguistic features used in the construction were stop words, part-of-speech filtering and stemming. Five different part-of-speech filters were used, with different mixtures of nouns, verbs, and adjectives. The three methods were evaluated by summarizing documents from the Document Understanding Conference and comparing them to gold-standard summarization created by human judges. Comparison between the system summaries and gold-standard summaries was made with the ROUGE-1 measure. The algorithm with semantic folding performed worst of the three methods, but only 0.0096 worse in F-score than cosine similarity of tf-idf vectors that performed best. For semantic folding, the average precision was 46.2% and recall 45.7% for the best-performing part-of-speech filter. Det här examensarbetet behandlar automatisk textsammanfattning och hur semantisk vikning kan användas som likhetsmått mellan meningar i algoritmen TextRank. Metoden implementerades och jämfördes med två vanliga likhetsmått. Dessa två likhetsmått var cosinus-likhet mellan tf-idf-vektorer samt antal överlappande termer i två meningar. De tre metoderna implementerades och de lingvistiska särdragen som användes vid konstruktionen var stoppord, filtrering av ordklasser samt en avstämmare. Fem olika filter för ordklasser användes, med olika blandningar av substantiv, verb och adjektiv. De tre metoderna utvärderades genom att sammanfatta dokument från DUC och jämföra dessa mot guldsammanfattningar skapade av mänskliga domare. Jämförelse mellan systemsammanfattningar och guldsammanfattningar gjordes med måttet ROUGE-1. Algoritmen med semantisk vikning presterade sämst av de tre jämförda metoderna, dock bara 0.0096 sämre i F-score än cosinus-likhet mellan tf-idf-vektorer som presterade bäst. För semantisk vikning var den genomsnittliga precisionen 46.2% och recall 45.7% för det ordklassfiltret som presterade bäst.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______260::4afe65a17d2fbac592333b4a9e71731a&type=result"></script>');
-->
</script>
Green | |
bronze |
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______260::4afe65a17d2fbac592333b4a9e71731a&type=result"></script>');
-->
</script>
Systematic review of research manuscripts is a common procedure in which research studies pertaining a particular field or domain are classified and structured in a methodological way. This process involves, between other steps, an extensive review and consolidation of scientific metrics and attributes of the manuscripts, such as citations, type or venue of publication. The extraction and mapping of relevant publication data, evidently, is a very laborious task if performed manually. Automation of such systematic mapping steps intend to reduce the human effort required and therefore can potentially reduce the time required for this process.The objective of this thesis is to automate the data extraction and mapping steps when systematically reviewing studies. The manual process is replaced by novel graph modelling techniques for effective knowledge representation, as well as novel machine learning techniques that aim to learn these representations. This eventually automates this process by characterising the publications on the basis of certain sub-properties and qualities that give the reviewer a quick high-level overview of each research study. The final model is a concept learner that predicts these sub-properties which in addition addresses the inherent concept-drift of novel manuscripts over time. Different models were developed and explored in this research study for the development of concept learner.Results show that: (1) Graph reasoning techniques which leverage the expressive power in modern graph databases are very effective in capturing the extracted knowledge in a so-called knowledge graph, which allows us to form concepts that can be learned using standard machine learning techniques like logistic regression, decision trees and neural networks etc. (2) Neural network models and ensemble models outperformed other standard machine learning techniques like logistic regression and decision trees based on the evaluation metrics. (3) The concept learner is able to detect and avoid concept drift by retraining the model. Systematisk granskning av forskningsmanuskript är en vanlig procedur där forskningsstudier inom ett visst område klassificeras och struktureras på ett metodologiskt sätt. Denna process innefattar en omfattande granskning och sammanförande av vetenskapliga mätvärden och attribut för manuskriptet, såsom citat, typ av manuskript eller publiceringsplats. Framställning och kartläggning av relevant publikationsdata är uppenbarligen en mycket mödosam uppgift om den utförs manuellt. Avsikten med automatiseringen av processen för denna typ av systematisk kartläggning är att minska den mänskliga ansträngningen, och den tid som krävs kan på så sätt minskas. Syftet med denna avhandling är att automatisera datautvinning och stegen för kartläggning vid systematisk granskning av studier. Den manuella processen ersätts av avancerade grafmodelleringstekniker för effektiv kunskapsrepresentation, liksom avancerade maskininlärningstekniker som syftar till att lära maskinen dessa representationer. Detta automatiserar så småningom denna process genom att karakterisera publikationerna beserat på vissa subjektiva egenskaper och kvaliter som ger granskaren en snabb god översikt över varje forskningsstudie. Den slutliga modellen är ett inlärningskoncept som förutsäger dessa subjektiva egenskaper och dessutom behandlar den inneboende konceptuella driften i manuskriptet över tiden. Olika modeller utvecklades och undersöktes i denna forskningsstudie för utvecklingen av inlärningskonceptet. Resultaten visar att: (1) Diagrammatiskt resonerande som uttnytjar moderna grafdatabaser är mycket effektiva för att fånga den framställda kunskapen i en så kallad kunskapsgraf, och gör det möjligt att vidareutveckla koncept som kan läras med hjälp av standard tekniker för maskininlärning. (2) Neurala nätverksmodeller och ensemblemodeller överträffade andra standard maskininlärningstekniker baserat på utvärderingsvärdena. (3) Inlärningskonceptet kan detektera och undvika konceptuell drift baserat på F1-poäng och omlärning av algoritmen.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______681::29d51b8d9839c2b641a3953b48aa8057&type=result"></script>');
-->
</script>
Green | |
bronze |
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______681::29d51b8d9839c2b641a3953b48aa8057&type=result"></script>');
-->
</script>
A dataset consisting of logs describing results of tests from a single Build and Test process, used in a Continous Integration setting, is utilized to automate categorization of the logs according to failure types. Two different features are evaluated, words and log keys, using unordered document matrices as document representations to determine the viability of log keys. The experiment uses Multinomial Naive Bayes, MNB, classifiers and multi-class Support Vector Machines, SVM, to establish the performance of the different features. The experiment indicates that log keys are equivalent to using words whilst achieving a great reduction in dictionary size. Three different multi-layer perceptrons are evaluated on the log key document matrices achieving slightly higher cross-validation accuracies than the SVM. A shallow-and-wide Convolutional Neural Network, CNN, is then designed using temporal sequences of log keys as document representations. The top performing model of each model architecture is evaluated on a test set except for the MNB classifiers as the MNB had subpar performance during cross-validation. The test set evaluation indicates that the CNN is superior to the other models. Ett dataset som består av loggar som beskriver resultat av test från en bygg- och testprocess, använt i en miljö med kontinuerlig integration, används för att automatiskt kategorisera loggar enligt olika feltyper. Två olika sorters indata evalueras, ord och loggnycklar, där icke- ordnade dokumentmatriser används som dokumentrepresentationer för att avgöra loggnycklars användbarhet. Experimentet använder multinomial naiv bayes, MNB, som klassificerare och multiklass-supportvektormaskiner, SVM, för att avgöra prestandan för de olika sorternas indata. Experimentet indikerar att loggnycklar är ekvivalenta med ord medan loggnycklar har mycket mindre ordboksstorlek. Tre olika multi-lager-perceptroner evalueras på loggnyckel-dokumentmatriser och får något högre exakthet i krossvalideringen jämfört med SVM. Ett grunt-och-brett faltningsnätverk, CNN, designas med tidsmässiga sekvenser av loggnycklar som dokumentrepresentationer. De topppresterande modellerna av varje modellarkitektur evalueras på ett testset, utom för MNB-klassificerarna då MNB har dålig prestanda under krossvalidering. Evalueringen av testsetet indikerar att CNN:en är bättre än de andra modellerna.