The use of Deep Learning methods for Document Understanding has been embraced by the research community in recent years. A requirement for Deep Learning methods and especially Transformer Networks, is access to large datasets. The objective of this thesis was to evaluate a state-of-the-art model for Document Layout Analysis on a public and custom dataset. Additionally, the objective was to build a pipeline for building a dataset specifically for Visually Rich Documents. The research methodology consisted of a literature study to find the state-of-the-art model for Document Layout Analysis and a relevant dataset used to evaluate the chosen model. The literature study also included research on how existing datasets in the domain were collected and processed. Finally, an evaluation framework was created. The evaluation showed that the chosen multi-modal transformer network, LayoutLMv2, performed well on the Docbank dataset. The custom build dataset was limited by class imbalance, although good performance for the larger classes. The annotator tool and its auto-tagging feature performed well and the proposed pipelined showed great promise for creating datasets with Visually Rich Documents. In conclusion, this thesis project answers the research questions and suggests two main opportunities. The first is to encourage others to build datasets with Visually Rich Documents using a similar pipeline to the one presented in this paper. The second is to evaluate the possibility of creating the visual token information for LayoutLMv2 as part of the transformer network rather than using a separate CNN. Användningen av Deep Learning-metoder för dokumentförståelse har anammats av forskarvärlden de senaste åren. Ett krav för Deep Learning-metoder och speciellt Transformer Networks är tillgång till stora datamängder. Syftet med denna avhandling var att utvärdera en state-of-the-art modell för analys av dokumentlayout på en offentligt tillgängligt dataset. Dessutom var målet att bygga en pipeline för att bygga en dataset specifikt för Visuallt Rika Dokument. Forskningsmetodiken bestod av en litteraturstudie för att hitta modellen för Document Layout Analys och ett relevant dataset som användes för att utvärdera den valda modellen. Litteraturstudien omfattade också forskning om hur befintliga dataset i domänen samlades in och bearbetades. Slutligen skapades en utvärderingsram. Utvärderingen visade att det valda multimodala transformatornätverket, LayoutLMv2, fungerade bra på Docbank-datasetet. Den skapade datasetet begränsades av klassobalans även om bra prestanda för de större klasserna erhölls. Annotatorverktyget och dess autotaggningsfunktion fungerade bra och den föreslagna pipelinen visade sig vara mycket lovande för att skapa dataset med VVisuallt Rika Dokument.svis besvarar detta examensarbete forskningsfrågorna och föreslår två huvudsakliga möjligheter. Den första är att uppmuntra andra att bygga datauppsättningar med Visuallt Rika Dokument med en liknande pipeline som den som presenteras i denna uppsats. Det andra är att utvärdera möjligheten att skapa den visuella tokeninformationen för LayoutLMv2 som en del av transformatornätverket snarare än att använda en separat CNN.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______260::f7544e220795dbffc742848c78f0aabf&type=result"></script>');
-->
</script>
Green |
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______260::f7544e220795dbffc742848c78f0aabf&type=result"></script>');
-->
</script>
Uppgifter för behandling av naturliga språk (NLP) har under de senaste åren visat sig vara särskilt effektiva när man använder förtränade språkmodeller som BERT. Det enorma kravet på datorresurser som krävs för att träna sådana modeller gör det dock svårt att använda dem i verkligheten. För att lösa detta problem har komprimeringsmetoder utvecklats. I det här projektet studeras, genomförs och testas några av dessa metoder för komprimering av neurala nätverk för textbearbetning. I vårt fall var den mest effektiva metoden Knowledge Distillation, som består i att överföra kunskap från ett stort neuralt nätverk, som kallas läraren, till ett litet neuralt nätverk, som kallas eleven. Det finns flera varianter av detta tillvägagångssätt, som skiljer sig åt i komplexitet. Vi kommer att titta på två av dem i det här projektet. Den första gör det möjligt att överföra kunskap mellan ett neuralt nätverk och en mindre dubbelriktad LSTM, genom att endast använda resultatet från den större modellen. Och en andra, mer komplex metod som uppmuntrar elevmodellen att också lära sig av lärarmodellens mellanliggande lager för att utvinna kunskap. Det slutliga målet med detta projekt är att ge företagets datavetare färdiga komprimeringsmetoder för framtida projekt som kräver användning av djupa neurala nätverk för NLP. Natural language processing (NLP) tasks have proven to be particularly effective when using pre-trained language models such as BERT. However, the enormous demand on computational resources required to train such models makes their use in the real world difficult. To overcome this problem, compression methods have emerged in recent years. In this project, some of these neural network compression approaches for text processing are studied, implemented and tested. In our case, the most efficient method was Knowledge Distillation, which consists in transmitting knowledge from a large neural network, called teacher, to a small neural network, called student. There are several variants of this approach, which differ in their complexity. We will see two of them in this project, the first one which allows a knowledge transfer between any neural network and another smaller bidirectional LSTM, using only the output of the larger model. And a second, more complex approach that encourages the student model to also learn from the intermediate layers of the teacher model for incremental knowledge extraction. The ultimate goal of this project is to provide the company’s data scientists with ready-to-use compression methods for their future projects requiring the use of deep neural networks for NLP.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______681::69a26608626358792e2c6512ad59ecb4&type=result"></script>');
-->
</script>
Green | |
bronze |
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______681::69a26608626358792e2c6512ad59ecb4&type=result"></script>');
-->
</script>
Introduction The waterfront of Stockholm, one of Europe's fastest-growing cities, stands at the forefront of climate change challenges. As such, there is a pressing need for innovative solutions and resilient urban design. The SOS Climate Waterfront research project gathered international experts and local representatives, coming from different disciplines to work together in May-June 2022 to discuss, explore proposals and design Sustainable Open Solutions (SOS). This book explores three urban sites in Stockholm, holding significant implications for the city's waterfront— Lövholmen, Frihamnen, and Södra Värtan. During the workshop, SOS Climate Waterfront participants, mainly European researchers, analyzed future challenges, raised new questions, and depicted solutions, which can now contribute to cross-country comparisons in a larger EU-framework. The three sites are not only driven by the demand for more housing but also face crucial issues related to cultural heritage, climate change, landscape ecology, and social development. Achieving a delicate balance between these aspects and economic interests presents a significant task for the city. The waterfront of Stockholm holds substantial relevance in the context of climate change and its impact on coastal areas. Thus, analysis of the Swedish context, based on data collected and on-site knowledge sustains a deeper understanding of the challenges and opportunities that lie ahead. Stockholm is expected to be affected by the impacts of climate change, including temperature increases, changing precipitation patterns, and the potential for more frequent cloudbursts. While the rising sea level is a long-term challenge rather than an immediate concern, increasing risks of extreme weather events and flooding were taken in consideration. Stockholm rests on two different bodies of water, at a location where the Baltic Sea (Östersjön in Swedish) with brackish water meets Lake Mälaren, which is an important provider of freshwater for the larger Stockholm area. As the lyrics of a popular contemporary Swedish song (by Robert Broberg) describe it: “the city is full of water”. However, to ensure that the ecological and chemical status will be maintained, in facing future challenges in terms of urbanisation and climate change, much attention has been paid to ensure the preservation of the water quality of the Mälaren Lake, a vital water source for two million people. The city values its water and continuously invests in improving the situation (e.g. the new sluice at Slussen). The activities carried out in the SOS Climate Waterfront workshop in Stockholm integrated this relationship to water as well as the continuing land-rise, the balance of which adds complexity to the sea level modelling and therefore also to the anticipations and scenarios for the future. In this book, the authors explore innovative strategies and design proposals to tackle these challenges while preserving the cultural identity and heritage value of the sites. Researchers from various European cities, supported by experts and academic lectures, analyze extensive input materials and information, ranging from planning documents and historical records to consultation reports and city visions. By drawing upon multidisciplinary backgrounds and experiences, the researchers identify the socioeconomic and environmental qualities of each site, ultimately developing site design concepts and solutions that address climate change challenges, the maintenance of cultural identities, and the protection of biodiversity. Throughout the book, the proposed designs emphasize the importance of finding a balance between preserving cultural heritage, the values of local communities, the stimulating economic growth, and promotion of sustainable urban development. Key elements include the reuse of existing infrastructure, the integration of green-blue schemes, the improvement of biodiversity, and the creation of vibrant and multi-functional neighbourhoods that connect people to each other and their surroundings. While design solutions present promising approaches, their implementation and the institutional challenges that may arise in specific city contexts remain external to the results presented here. The book acknowledges the need for further research and highlights the shared recognition among the workshop participants regarding the gaps and blind spots in their findings. The following chapters of the book delve into climate change in Sweden, the role of culture and arts in the environmental movement, and specific case studies and design proposals for each site. By exploring these diverse perspectives, this book aims to contribute to the ongoing discourse on sustainable urban design and planning, to inspire innovative approaches in addressing complex challenges faced by Stockholm in the future. PART 1 of the book offers a comprehensive understanding of climate change in Sweden, street fishing in Stockholm, and the role of culture and arts in the environmental movement in the Nordic Region and internationally. Furthermore, the lessons from Stockholm and its surroundings in this report draw on presentations, by professionals and researchers from various fields, made during the workshop. Some of these lessons have been written into interesting articles, introduced below. The chapter “Climate change in Sweden” by Magnus Joelsson from the Swedish Meteorological and Hydrological Institute (SMHI) provides an updated analysis with data and the context for discussing climate change in Sweden. The text makes the distinction between weather and climate, referring to the expression “Climate is what you expect, weather is what you get” that Mark Twain is said to have coined. Moreover, calling for actions by emphasising that the trend of climate change is expected to continue, both globally and in Sweden. What will happen in the far future still depends on our actions, now and in the future. The contribution entitled “Urban nature does not stop at the waterfront, neither should urban planning, a case study of street fishing in Stockholm” raises questions about how planning and strategies for waterfront areas in cities should consider more perspectives from a wider group of interests. It discusses how urban dwellers live with water, with a focus on recreational fishing and what this use entails. The authors (Anja Moum Rieser, from KTH Royal Institute of Technology, Wieben Johannes Boonstra and Rikard Hedling, both from Uppsala University) go beyond the human-centric view and expand the gaze to other species’ needs and also incorporating the body of water in planning for the urban waterfront areas. The chapter “The role of culture and arts in the environmental movement in the Nordic Region and internationally” by Elisavet Papageorgiou and Iwona Preis from Intercult, discusses artistic perspectives on sustainability and climate change. This focuses on how art and culture can raise awareness, provide inspiring actions, and promote social cohesion around sustainable practices. Drawing on experiences from projects aiming to invite and engage community dialogues, they argue that artistic strategies can challenge dominant narratives and promote alternative visions for a sustainable future. The contribution “Sense the Marsh” by Thelma Dethelfsen from KTH The Royal Institute of Technology, emphasises the importance of architecture and landscape design in creating adaptive and resilient strategies to manage flooding and sea level rise. The study focuses on how designs can encourage interaction and awareness with the surroundings. Thereby highlighting the interfaces between humans and nature and raising questions about how flooding can be used as a quality and catalyst to attract more people to an area. The resulting design provides an opportunity to experience nature though the design and architectural solutions, situated on the border between human, non-human species and nature. In PART 2, readers will explore the detailed design proposals developed by different groups for the urban sites in focus. These proposals aim to intertwine sustainability, cultural identity, and economic interests, offering insights into the potential for resilient and vibrant urban spaces. By assessing existing conditions on three sites analysed in Stockholm, including Lövholmen, Frihamnen, and Södra Värtan, the teams participating in the workshop actively contributed to the analysis of the sites and development of design solutions for the areas, in the end forming strategies for better preparedness for future challenges and better lives for the inhabitants. Lövholmen is located in the north-western part of Liljeholmen, one of the major developmental centres in Stockholm. The area is currently a closed-off industrial site, but the municipality’s intention is to redevelop it into a mixed urban space with homes, workplaces, shops, schools, and more. It's expected that 1500 new homes will be built in the area. Many of the current industrial buildings are empty and in bad shape. While some of these will be replaced with housing, other industrial buildings have heritage value and should be protected during the development, after which a new use should be found for them. Frihamnen is, together with the Södra Värtan project, part of the larger development of ”Norra Djurgårdsstaden”, the Stockholm Royal Seaport. Frihamnen is located to the south of Värtahamnen and is in turn strongly connected to Loudden in the south. The municipality plans for the area to contain approximately 1700 homes, 4000 workplaces and 75,000 m2 of retail and office space. Some of the existing businesses in Frihamnen will remain, but much of the existing infrastructure is planned to be removed. The harbour no longer handles freight shipping, but passenger ships will continue to depart from the harbour (Frihamnspiren). Södra Värtan is planned to contain 1500 apartments, 20 preschool departments, 155,000 m2 of office and retail space, as well as 10,000 m2 of parks and a 600 m long waterfront walkway. The new development is intended to co-exist with the activities in the harbour, which creates challenges such as the blocking of noise stemming from the cruise ships. The walkways along the waterfront are planned to have shops and restaurants. The contributions of the articles, together with the SOS Climate Waterfront teams’ analysis of the three sites in Stockholm, provides relevant and timely interdisciplinary efforts to co-create novel solutions and future strategies to manage the climate challenges ahead. The solutions relate to the history of the urban territory, actors involved (or those excluded) and changes, over time, of planning ideals. A key theme is how to plan by creating inclusive strategies for the future by involving representatives of diverse interests, competences, and future visions for the sites. The consequences of climate change are affecting these different stakeholders and citizens in a wide range of ways, so including them in the process is crucial. This also includes the inclusion of future generations’ views on urban transformation. The largest challenge is to create new, novel solutions where these human interests, as well as those of local nature and non-human species, can be incorporated, in an effort to plan and design for a mitigation and management of the consequences of climate change. As we embark on this journey of exploration and innovation, we invite readers to delve into the pages of this book, where interdisciplinary research, creative design, and a shared commitment to sustainable urban development and decarbonisation strategies converge. Together, let us envision a future where cities thrive, harmoniously balancing their heritage, environment, and economic aspirations. QC 20231115 SOS Climate Waterfront https://cordis.europa.eu/project/id/823901
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______260::923b2b74193fbdaf1d7ed9fdc9c0c91d&type=result"></script>');
-->
</script>
Green |
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______260::923b2b74193fbdaf1d7ed9fdc9c0c91d&type=result"></script>');
-->
</script>
Part of book: ISBN 978-1-009-10023-6QC 20221219
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.1017/9781009110044.003&type=result"></script>');
-->
</script>
Green |
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.1017/9781009110044.003&type=result"></script>');
-->
</script>
Företag existerar inte som isolerade organisationer. De är inbäddade i strukturella relationer med varandra. Att kartlägga ett visst företags relationer med andra företag när det gäller konkurrenter, dotterbolag, leverantörer och kunder är nyckeln till att förstå företagets huvudsakliga riskfaktorer och möjligheter. Det konventionella sättet att hålla sig uppdaterad med denna viktiga kunskap var genom att läsa ekonomiska nyheter och rapporter från högkvalificerad manuell arbetskraft som till exempel en finansanalytiker. Men med utvecklingen av ”Natural Language Processing” (NLP) och grafdatabaser är det nu möjligt att systematiskt extrahera och lagra strukturerad information från ostrukturerade datakällor. Den nuvarande metoden för att effektivt extrahera information använder övervakade maskininlärningsmodeller som kräver en stor mängd märkta träningsdata. Datamärkningsprocessen är vanligtvis tidskrävande och svår att få i ett domänspecifikt område. Detta projekt utforskar ett tillvägagångssätt för att konstruera en företagsdomänspecifikt ”Knowledge Graph” (KG) som innehåller företagsrelaterade enheter och relationer från SEC 10-K-arkivering genom att kombinera en i förväg tränad allmän NLP med regelbaserade mönster i ”Named Entity Recognition” (NER) och ”Relation Extraction” (RE). Detta tillvägagångssätt eliminerar den tidskrävande datamärkningsuppgiften i det statistiska tillvägagångssättet och genom att utvärdera tio SEC 10-K arkiv har modellen den totala återkallelsen på 53,6 %, precision på 75,7 % och F1-poängen på 62,8 %. Resultatet visar att det är möjligt att extrahera företagsinformation med hybridmetoderna, vilket inte kräver en stor mängd märkta träningsdata. Projektet kräver dock en tidskrävande process för att hitta lexikala mönster från meningar för att extrahera företagsrelaterade enheter och relationer. Companies do not exist in isolation. They are embedded in structural relationships with each other. Mapping a given company’s relationships with other companies in terms of competitors, subsidiaries, suppliers, and customers are key to understanding a company’s major risk factors and opportunities. Conventionally, obtaining and staying up to date with this key knowledge was achieved by reading financial news and reports by highly skilled manual labor like a financial analyst. However, with the development of Natural Language Processing (NLP) and graph databases, it is now possible to systematically extract and store structured information from unstructured data sources. The current go-to method to effectively extract information uses supervised machine learning models, which require a large amount of labeled training data. The data labeling process is usually time-consuming and hard to get in a domain-specific area. This project explores an approach to construct a company domain-specific Knowledge Graph (KG) that contains company-related entities and relationships from the U.S. Securities and Exchange Commission (SEC) 10-K filings by combining a pre-trained general NLP with rule-based patterns in Named Entity Recognition (NER) and Relation Extraction (RE). This approach eliminates the time-consuming data-labeling task in the statistical approach, and by evaluating ten 10-k filings, the model has the overall Recall of 53.6%, Precision of 75.7%, and the F1-score of 62.8%. The result shows it is possible to extract company information using the hybrid methods, which does not require a large amount of labeled training data. However, the project requires the time-consuming process of finding lexical patterns from sentences to extract company-related entities and relationships.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______681::65088c9dcf50bb4de2e221bcdea69374&type=result"></script>');
-->
</script>
Green | |
bronze |
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______681::65088c9dcf50bb4de2e221bcdea69374&type=result"></script>');
-->
</script>
School teachers spend approximately 30 percent of their time grading exams and other assessments. With an increasingly digitized education, a research field have been initiated that aims to reduce the time spent on grading by automating it. This is an easy task for multiple-choice questions but much harder for open-ended questions requiring free-text answers, where the latter have shown to be superior for knowledge assessment and learning consolidation. While results in previous work have presented promising results of up to 90 percent grading accuracy, it is still problematic using a system that gives the wrong grade in 10 percent of the cases. This has given rise to a research field focusing on assisting teachers in the grading process, instead of fully replacing them. Cluster analysis has been the most popular tool for this, grouping similar answers together and letting teachers process groups of answers at once, instead of evaluating each question one-at-a-time. This approach has shown evidence to decrease the time spent on grading substantially, however, the methods for performing the clustering vary widely between studies, leaving no apparent methodology choice for real-use implementation. Using several techniques for pre-processing, text representation and choice of clustering algorithm, this work compared various methods for clustering free-text answers by evaluating them on a dataset containing almost 400 000 student answers. The results showed that using all of the tested pre-processing techniques led to the best performance, although the difference to using minimum pre-processing were small. Sentence embeddings were the text representation approach that performed the best, however, it remains to be answered how it should be used when spelling and grammar is part of the assessment, as it lacks the ability to identify such errors. A suitable choice of clustering algorithm is one where the number of clusters can be specified, as determining this automatically proved to be difficult. Teachers can then easily adjust the number of clusters based on their judgement. Skollärare spenderar ungefär 30 procent av sin tid på rättning av prov och andra bedömningar. I takt med att mer utbildning digitaliseras, försöker forskare hitta sätt att automatisera rättning för att minska den administrativa bördan för lärare. Flervalsfrågor har fördelen att de enkelt kan rättas automatiskt, medan öppet ställda frågor som kräver ett fritt formulerat svar har visat sig vara ett bättre verktyg för att mäta elevers förståelse. Dessa typer av frågor är däremot betydligt svårare att rätta automatiskt, vilket lett till forskning inom automatisk rättning av dessa. Även om tidigare forskning har lyckats uppnå resultat med upp till 90 procents träffsäkerhet, är det fortfarande problematiskt att det blir fel i de resterande 10 procenten av fallen. Detta har lett till forskning som fokuserar på underlätta för lärare i rättningen, istället för att ersätta dem. Klusteranalys har varit det mest populära tillvägagångssättet för att åstadkomma detta, där liknande svar grupperas tillsammans, vilket möjliggör rättning av flera svar samtidigt. Denna metod har visat sig minska rättningstiden signifikant, däremot har metoderna för att göra klusteranalysen varierat brett, vilket gör det svårt att veta hur en implementering i ett verkligt scenario bör se ut. Genom att använda olika tekniker för textbearbetning, textrepresentation och val av klusteralgoritm, jämför detta arbete olika metoder för att klustra fritext-svar, genom att utvärdera dessa på nästan 400 000 riktiga elevsvar. Resultatet visar att mer textbearbetning generellt är bättre, även om skillnaderna är små. Användning av så kallade sentence embeddings ledde till bäst resultat när olika tekniker för textrepresentation jämfördes. Däremot har denna teknik svårare att identifiera grammatik- och stavningsfel, hur detta ska hanteras är en fråga för framtida forskning. Ett lämpligt val av klustringsalgoritm är en där antalet kluster kan bestämmas av användaren, då det visat sig svårt att bestämma det automatiskt. Lärare kan då justera antalet kluster ifall det skulle vara för få eller för många.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______260::b7b2c7397a234831410f628d60d5c6b2&type=result"></script>');
-->
</script>
Green |
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______260::b7b2c7397a234831410f628d60d5c6b2&type=result"></script>');
-->
</script>
QC 20211207
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______260::1d4da567c4005b3b1738f3433a926dcb&type=result"></script>');
-->
</script>
Green | |
bronze |
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______260::1d4da567c4005b3b1738f3433a926dcb&type=result"></script>');
-->
</script>
The demand for automation of simple tasks is constantly increasing. While some tasks are easy to automate because the logic is fixed and the process is streamlined, other tasks are harder because the performance of the task is heavily reliant on the judgment of a human expert. Matching a consultant to an offer from a client is one such task, in which case the expert is either a manager to the consultants or someone within HR at the company. One way to approach this task is to model the specific domain of interest using natural language processing. If we can capture the relationships between relevant skills and phrases within the specific domain, we could potentially use the resulting embeddings in a consultant to offer matching scheme. In this paper, we propose a key phrase-based web scraping approach to collect the data we need for a domain-specific corpus. To retrieve the key phrases needed as prompts for web scraping, we propose using the transformer-based library KeyBERT on limited domain-specific in house data belonging to the consultant firm B3 Indes, in order to retrieve the most important phrases in their respective contexts. Facebook's Word2vec based language model fasttext is then used on the processed corpus to create the fixed word embeddings. We also investigate numerous different approaches for selecting the right key phrases for web scraping in a human similarity comparison scheme, as well as comparisons to a larger pretrained general domain fasttext model. We show that utilizing key phrases for a domain-specific fasttext model could be beneficial compared to using a larger pretrained model. The results are not consistently conclusive under the current analytical framework. The results also indicate that KeyBERT is beneficial when selecting the key phrases compared to the randomized sampling of relevant phrases; however, the results are not conclusive. Efterfrågan för automatisering av enkla uppgifter efterfrågas alltmer. Medan vissa uppgifter är lätta att automatisera eftersom logiken är fast och processen är tydlig, är andra svårare eftersom utförandet av uppgiften starkt beror på en människas expertis. Att matcha en konsult till ett erbjudande från en klient är en sådan uppgift, där experten är antingen en chef för konsulterna eller någon inom HR på företaget. En metod för att hantera denna uppgift är att modellera det specifika området av intresse med hjälp av maskininlärningsbaserad språkteknologi. Om vi kan fånga relationerna mellan relevanta färdigheter och fraser inom det specifika området, skulle vi potentiellt kunna använda de resulterande inbäddningarna i ett matchningsprocess mellan konsulter och uppdrag. I denna rapport föreslås en nyckelordsbaserad webbskrapnings-metod för att samla in data som behövs för ett domänspecifikt korpus. För att hämta de nyckelord som behövs som input för webbskrapning, föreslår vi att använda transformator-baserade biblioteket KeyBERT på begränsad domänspecifik data från konsultbolaget B3 Indes, detta för att hämta de viktigaste fraserna i deras respektive sammanhang. Sedan används Facebooks Word2vec baserade språkmodell fasttext på det bearbetade korpuset för att skapa statiska inbäddningar. Vi undersöker också olika metoder för att välja rätt nyckelord för webbskrapning i en likhets-jämnförelse mot mänskliga experter, samt jämförelser med en större förtränad fasttext-modell som inte är domänspecifik. Vi visar att användning av nyckelord för webbskrapning för träning av en domänspecifik fasttext-modell skulle kunna vara fördelaktigt jämnfört med en förtränad modell, men resutaten är inte konsekvent signifikanta enligt det begränsade analytiska ramverket. Resultaten indikerar också att KeyBERT är fördelaktigt vid valet av nyckelord jämfört med slumpmässigt urval av relevanta fraser, men dessa resultat är inte heller helt entydiga.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______260::f21f900e677cd64ee65e3d61e7e06d88&type=result"></script>');
-->
</script>
Green |
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______260::f21f900e677cd64ee65e3d61e7e06d88&type=result"></script>');
-->
</script>
Systematic review of research manuscripts is a common procedure in which research studies pertaining a particular field or domain are classified and structured in a methodological way. This process involves, between other steps, an extensive review and consolidation of scientific metrics and attributes of the manuscripts, such as citations, type or venue of publication. The extraction and mapping of relevant publication data, evidently, is a very laborious task if performed manually. Automation of such systematic mapping steps intend to reduce the human effort required and therefore can potentially reduce the time required for this process.The objective of this thesis is to automate the data extraction and mapping steps when systematically reviewing studies. The manual process is replaced by novel graph modelling techniques for effective knowledge representation, as well as novel machine learning techniques that aim to learn these representations. This eventually automates this process by characterising the publications on the basis of certain sub-properties and qualities that give the reviewer a quick high-level overview of each research study. The final model is a concept learner that predicts these sub-properties which in addition addresses the inherent concept-drift of novel manuscripts over time. Different models were developed and explored in this research study for the development of concept learner.Results show that: (1) Graph reasoning techniques which leverage the expressive power in modern graph databases are very effective in capturing the extracted knowledge in a so-called knowledge graph, which allows us to form concepts that can be learned using standard machine learning techniques like logistic regression, decision trees and neural networks etc. (2) Neural network models and ensemble models outperformed other standard machine learning techniques like logistic regression and decision trees based on the evaluation metrics. (3) The concept learner is able to detect and avoid concept drift by retraining the model. Systematisk granskning av forskningsmanuskript är en vanlig procedur där forskningsstudier inom ett visst område klassificeras och struktureras på ett metodologiskt sätt. Denna process innefattar en omfattande granskning och sammanförande av vetenskapliga mätvärden och attribut för manuskriptet, såsom citat, typ av manuskript eller publiceringsplats. Framställning och kartläggning av relevant publikationsdata är uppenbarligen en mycket mödosam uppgift om den utförs manuellt. Avsikten med automatiseringen av processen för denna typ av systematisk kartläggning är att minska den mänskliga ansträngningen, och den tid som krävs kan på så sätt minskas. Syftet med denna avhandling är att automatisera datautvinning och stegen för kartläggning vid systematisk granskning av studier. Den manuella processen ersätts av avancerade grafmodelleringstekniker för effektiv kunskapsrepresentation, liksom avancerade maskininlärningstekniker som syftar till att lära maskinen dessa representationer. Detta automatiserar så småningom denna process genom att karakterisera publikationerna beserat på vissa subjektiva egenskaper och kvaliter som ger granskaren en snabb god översikt över varje forskningsstudie. Den slutliga modellen är ett inlärningskoncept som förutsäger dessa subjektiva egenskaper och dessutom behandlar den inneboende konceptuella driften i manuskriptet över tiden. Olika modeller utvecklades och undersöktes i denna forskningsstudie för utvecklingen av inlärningskonceptet. Resultaten visar att: (1) Diagrammatiskt resonerande som uttnytjar moderna grafdatabaser är mycket effektiva för att fånga den framställda kunskapen i en så kallad kunskapsgraf, och gör det möjligt att vidareutveckla koncept som kan läras med hjälp av standard tekniker för maskininlärning. (2) Neurala nätverksmodeller och ensemblemodeller överträffade andra standard maskininlärningstekniker baserat på utvärderingsvärdena. (3) Inlärningskonceptet kan detektera och undvika konceptuell drift baserat på F1-poäng och omlärning av algoritmen.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______260::29d51b8d9839c2b641a3953b48aa8057&type=result"></script>');
-->
</script>
Green | |
bronze |
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______260::29d51b8d9839c2b641a3953b48aa8057&type=result"></script>');
-->
</script>
Skidvallans historia framstår som en spegelbild av samhällets utveckling, där svensk ingenjörskonst länge ledde jakten på en universalvalla. När fluorvallan nu av ekologiska skäl förbjudits kanske vi åter börjar söka fästet i tjärdalen och glidet i talg? QC 20240223
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______260::4e08b2316084cf76bf850f9379b040e1&type=result"></script>');
-->
</script>
Green |
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______260::4e08b2316084cf76bf850f9379b040e1&type=result"></script>');
-->
</script>
The use of Deep Learning methods for Document Understanding has been embraced by the research community in recent years. A requirement for Deep Learning methods and especially Transformer Networks, is access to large datasets. The objective of this thesis was to evaluate a state-of-the-art model for Document Layout Analysis on a public and custom dataset. Additionally, the objective was to build a pipeline for building a dataset specifically for Visually Rich Documents. The research methodology consisted of a literature study to find the state-of-the-art model for Document Layout Analysis and a relevant dataset used to evaluate the chosen model. The literature study also included research on how existing datasets in the domain were collected and processed. Finally, an evaluation framework was created. The evaluation showed that the chosen multi-modal transformer network, LayoutLMv2, performed well on the Docbank dataset. The custom build dataset was limited by class imbalance, although good performance for the larger classes. The annotator tool and its auto-tagging feature performed well and the proposed pipelined showed great promise for creating datasets with Visually Rich Documents. In conclusion, this thesis project answers the research questions and suggests two main opportunities. The first is to encourage others to build datasets with Visually Rich Documents using a similar pipeline to the one presented in this paper. The second is to evaluate the possibility of creating the visual token information for LayoutLMv2 as part of the transformer network rather than using a separate CNN. Användningen av Deep Learning-metoder för dokumentförståelse har anammats av forskarvärlden de senaste åren. Ett krav för Deep Learning-metoder och speciellt Transformer Networks är tillgång till stora datamängder. Syftet med denna avhandling var att utvärdera en state-of-the-art modell för analys av dokumentlayout på en offentligt tillgängligt dataset. Dessutom var målet att bygga en pipeline för att bygga en dataset specifikt för Visuallt Rika Dokument. Forskningsmetodiken bestod av en litteraturstudie för att hitta modellen för Document Layout Analys och ett relevant dataset som användes för att utvärdera den valda modellen. Litteraturstudien omfattade också forskning om hur befintliga dataset i domänen samlades in och bearbetades. Slutligen skapades en utvärderingsram. Utvärderingen visade att det valda multimodala transformatornätverket, LayoutLMv2, fungerade bra på Docbank-datasetet. Den skapade datasetet begränsades av klassobalans även om bra prestanda för de större klasserna erhölls. Annotatorverktyget och dess autotaggningsfunktion fungerade bra och den föreslagna pipelinen visade sig vara mycket lovande för att skapa dataset med VVisuallt Rika Dokument.svis besvarar detta examensarbete forskningsfrågorna och föreslår två huvudsakliga möjligheter. Den första är att uppmuntra andra att bygga datauppsättningar med Visuallt Rika Dokument med en liknande pipeline som den som presenteras i denna uppsats. Det andra är att utvärdera möjligheten att skapa den visuella tokeninformationen för LayoutLMv2 som en del av transformatornätverket snarare än att använda en separat CNN.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______260::f7544e220795dbffc742848c78f0aabf&type=result"></script>');
-->
</script>
Green |
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______260::f7544e220795dbffc742848c78f0aabf&type=result"></script>');
-->
</script>
Uppgifter för behandling av naturliga språk (NLP) har under de senaste åren visat sig vara särskilt effektiva när man använder förtränade språkmodeller som BERT. Det enorma kravet på datorresurser som krävs för att träna sådana modeller gör det dock svårt att använda dem i verkligheten. För att lösa detta problem har komprimeringsmetoder utvecklats. I det här projektet studeras, genomförs och testas några av dessa metoder för komprimering av neurala nätverk för textbearbetning. I vårt fall var den mest effektiva metoden Knowledge Distillation, som består i att överföra kunskap från ett stort neuralt nätverk, som kallas läraren, till ett litet neuralt nätverk, som kallas eleven. Det finns flera varianter av detta tillvägagångssätt, som skiljer sig åt i komplexitet. Vi kommer att titta på två av dem i det här projektet. Den första gör det möjligt att överföra kunskap mellan ett neuralt nätverk och en mindre dubbelriktad LSTM, genom att endast använda resultatet från den större modellen. Och en andra, mer komplex metod som uppmuntrar elevmodellen att också lära sig av lärarmodellens mellanliggande lager för att utvinna kunskap. Det slutliga målet med detta projekt är att ge företagets datavetare färdiga komprimeringsmetoder för framtida projekt som kräver användning av djupa neurala nätverk för NLP. Natural language processing (NLP) tasks have proven to be particularly effective when using pre-trained language models such as BERT. However, the enormous demand on computational resources required to train such models makes their use in the real world difficult. To overcome this problem, compression methods have emerged in recent years. In this project, some of these neural network compression approaches for text processing are studied, implemented and tested. In our case, the most efficient method was Knowledge Distillation, which consists in transmitting knowledge from a large neural network, called teacher, to a small neural network, called student. There are several variants of this approach, which differ in their complexity. We will see two of them in this project, the first one which allows a knowledge transfer between any neural network and another smaller bidirectional LSTM, using only the output of the larger model. And a second, more complex approach that encourages the student model to also learn from the intermediate layers of the teacher model for incremental knowledge extraction. The ultimate goal of this project is to provide the company’s data scientists with ready-to-use compression methods for their future projects requiring the use of deep neural networks for NLP.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______681::69a26608626358792e2c6512ad59ecb4&type=result"></script>');
-->
</script>
Green | |
bronze |
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______681::69a26608626358792e2c6512ad59ecb4&type=result"></script>');
-->
</script>
Introduction The waterfront of Stockholm, one of Europe's fastest-growing cities, stands at the forefront of climate change challenges. As such, there is a pressing need for innovative solutions and resilient urban design. The SOS Climate Waterfront research project gathered international experts and local representatives, coming from different disciplines to work together in May-June 2022 to discuss, explore proposals and design Sustainable Open Solutions (SOS). This book explores three urban sites in Stockholm, holding significant implications for the city's waterfront— Lövholmen, Frihamnen, and Södra Värtan. During the workshop, SOS Climate Waterfront participants, mainly European researchers, analyzed future challenges, raised new questions, and depicted solutions, which can now contribute to cross-country comparisons in a larger EU-framework. The three sites are not only driven by the demand for more housing but also face crucial issues related to cultural heritage, climate change, landscape ecology, and social development. Achieving a delicate balance between these aspects and economic interests presents a significant task for the city. The waterfront of Stockholm holds substantial relevance in the context of climate change and its impact on coastal areas. Thus, analysis of the Swedish context, based on data collected and on-site knowledge sustains a deeper understanding of the challenges and opportunities that lie ahead. Stockholm is expected to be affected by the impacts of climate change, including temperature increases, changing precipitation patterns, and the potential for more frequent cloudbursts. While the rising sea level is a long-term challenge rather than an immediate concern, increasing risks of extreme weather events and flooding were taken in consideration. Stockholm rests on two different bodies of water, at a location where the Baltic Sea (Östersjön in Swedish) with brackish water meets Lake Mälaren, which is an important provider of freshwater for the larger Stockholm area. As the lyrics of a popular contemporary Swedish song (by Robert Broberg) describe it: “the city is full of water”. However, to ensure that the ecological and chemical status will be maintained, in facing future challenges in terms of urbanisation and climate change, much attention has been paid to ensure the preservation of the water quality of the Mälaren Lake, a vital water source for two million people. The city values its water and continuously invests in improving the situation (e.g. the new sluice at Slussen). The activities carried out in the SOS Climate Waterfront workshop in Stockholm integrated this relationship to water as well as the continuing land-rise, the balance of which adds complexity to the sea level modelling and therefore also to the anticipations and scenarios for the future. In this book, the authors explore innovative strategies and design proposals to tackle these challenges while preserving the cultural identity and heritage value of the sites. Researchers from various European cities, supported by experts and academic lectures, analyze extensive input materials and information, ranging from planning documents and historical records to consultation reports and city visions. By drawing upon multidisciplinary backgrounds and experiences, the researchers identify the socioeconomic and environmental qualities of each site, ultimately developing site design concepts and solutions that address climate change challenges, the maintenance of cultural identities, and the protection of biodiversity. Throughout the book, the proposed designs emphasize the importance of finding a balance between preserving cultural heritage, the values of local communities, the stimulating economic growth, and promotion of sustainable urban development. Key elements include the reuse of existing infrastructure, the integration of green-blue schemes, the improvement of biodiversity, and the creation of vibrant and multi-functional neighbourhoods that connect people to each other and their surroundings. While design solutions present promising approaches, their implementation and the institutional challenges that may arise in specific city contexts remain external to the results presented here. The book acknowledges the need for further research and highlights the shared recognition among the workshop participants regarding the gaps and blind spots in their findings. The following chapters of the book delve into climate change in Sweden, the role of culture and arts in the environmental movement, and specific case studies and design proposals for each site. By exploring these diverse perspectives, this book aims to contribute to the ongoing discourse on sustainable urban design and planning, to inspire innovative approaches in addressing complex challenges faced by Stockholm in the future. PART 1 of the book offers a comprehensive understanding of climate change in Sweden, street fishing in Stockholm, and the role of culture and arts in the environmental movement in the Nordic Region and internationally. Furthermore, the lessons from Stockholm and its surroundings in this report draw on presentations, by professionals and researchers from various fields, made during the workshop. Some of these lessons have been written into interesting articles, introduced below. The chapter “Climate change in Sweden” by Magnus Joelsson from the Swedish Meteorological and Hydrological Institute (SMHI) provides an updated analysis with data and the context for discussing climate change in Sweden. The text makes the distinction between weather and climate, referring to the expression “Climate is what you expect, weather is what you get” that Mark Twain is said to have coined. Moreover, calling for actions by emphasising that the trend of climate change is expected to continue, both globally and in Sweden. What will happen in the far future still depends on our actions, now and in the future. The contribution entitled “Urban nature does not stop at the waterfront, neither should urban planning, a case study of street fishing in Stockholm” raises questions about how planning and strategies for waterfront areas in cities should consider more perspectives from a wider group of interests. It discusses how urban dwellers live with water, with a focus on recreational fishing and what this use entails. The authors (Anja Moum Rieser, from KTH Royal Institute of Technology, Wieben Johannes Boonstra and Rikard Hedling, both from Uppsala University) go beyond the human-centric view and expand the gaze to other species’ needs and also incorporating the body of water in planning for the urban waterfront areas. The chapter “The role of culture and arts in the environmental movement in the Nordic Region and internationally” by Elisavet Papageorgiou and Iwona Preis from Intercult, discusses artistic perspectives on sustainability and climate change. This focuses on how art and culture can raise awareness, provide inspiring actions, and promote social cohesion around sustainable practices. Drawing on experiences from projects aiming to invite and engage community dialogues, they argue that artistic strategies can challenge dominant narratives and promote alternative visions for a sustainable future. The contribution “Sense the Marsh” by Thelma Dethelfsen from KTH The Royal Institute of Technology, emphasises the importance of architecture and landscape design in creating adaptive and resilient strategies to manage flooding and sea level rise. The study focuses on how designs can encourage interaction and awareness with the surroundings. Thereby highlighting the interfaces between humans and nature and raising questions about how flooding can be used as a quality and catalyst to attract more people to an area. The resulting design provides an opportunity to experience nature though the design and architectural solutions, situated on the border between human, non-human species and nature. In PART 2, readers will explore the detailed design proposals developed by different groups for the urban sites in focus. These proposals aim to intertwine sustainability, cultural identity, and economic interests, offering insights into the potential for resilient and vibrant urban spaces. By assessing existing conditions on three sites analysed in Stockholm, including Lövholmen, Frihamnen, and Södra Värtan, the teams participating in the workshop actively contributed to the analysis of the sites and development of design solutions for the areas, in the end forming strategies for better preparedness for future challenges and better lives for the inhabitants. Lövholmen is located in the north-western part of Liljeholmen, one of the major developmental centres in Stockholm. The area is currently a closed-off industrial site, but the municipality’s intention is to redevelop it into a mixed urban space with homes, workplaces, shops, schools, and more. It's expected that 1500 new homes will be built in the area. Many of the current industrial buildings are empty and in bad shape. While some of these will be replaced with housing, other industrial buildings have heritage value and should be protected during the development, after which a new use should be found for them. Frihamnen is, together with the Södra Värtan project, part of the larger development of ”Norra Djurgårdsstaden”, the Stockholm Royal Seaport. Frihamnen is located to the south of Värtahamnen and is in turn strongly connected to Loudden in the south. The municipality plans for the area to contain approximately 1700 homes, 4000 workplaces and 75,000 m2 of retail and office space. Some of the existing businesses in Frihamnen will remain, but much of the existing infrastructure is planned to be removed. The harbour no longer handles freight shipping, but passenger ships will continue to depart from the harbour (Frihamnspiren). Södra Värtan is planned to contain 1500 apartments, 20 preschool departments, 155,000 m2 of office and retail space, as well as 10,000 m2 of parks and a 600 m long waterfront walkway. The new development is intended to co-exist with the activities in the harbour, which creates challenges such as the blocking of noise stemming from the cruise ships. The walkways along the waterfront are planned to have shops and restaurants. The contributions of the articles, together with the SOS Climate Waterfront teams’ analysis of the three sites in Stockholm, provides relevant and timely interdisciplinary efforts to co-create novel solutions and future strategies to manage the climate challenges ahead. The solutions relate to the history of the urban territory, actors involved (or those excluded) and changes, over time, of planning ideals. A key theme is how to plan by creating inclusive strategies for the future by involving representatives of diverse interests, competences, and future visions for the sites. The consequences of climate change are affecting these different stakeholders and citizens in a wide range of ways, so including them in the process is crucial. This also includes the inclusion of future generations’ views on urban transformation. The largest challenge is to create new, novel solutions where these human interests, as well as those of local nature and non-human species, can be incorporated, in an effort to plan and design for a mitigation and management of the consequences of climate change. As we embark on this journey of exploration and innovation, we invite readers to delve into the pages of this book, where interdisciplinary research, creative design, and a shared commitment to sustainable urban development and decarbonisation strategies converge. Together, let us envision a future where cities thrive, harmoniously balancing their heritage, environment, and economic aspirations. QC 20231115 SOS Climate Waterfront https://cordis.europa.eu/project/id/823901
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______260::923b2b74193fbdaf1d7ed9fdc9c0c91d&type=result"></script>');
-->
</script>
Green |
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______260::923b2b74193fbdaf1d7ed9fdc9c0c91d&type=result"></script>');
-->
</script>
Part of book: ISBN 978-1-009-10023-6QC 20221219
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.1017/9781009110044.003&type=result"></script>');
-->
</script>
Green |
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.1017/9781009110044.003&type=result"></script>');
-->
</script>
Företag existerar inte som isolerade organisationer. De är inbäddade i strukturella relationer med varandra. Att kartlägga ett visst företags relationer med andra företag när det gäller konkurrenter, dotterbolag, leverantörer och kunder är nyckeln till att förstå företagets huvudsakliga riskfaktorer och möjligheter. Det konventionella sättet att hålla sig uppdaterad med denna viktiga kunskap var genom att läsa ekonomiska nyheter och rapporter från högkvalificerad manuell arbetskraft som till exempel en finansanalytiker. Men med utvecklingen av ”Natural Language Processing” (NLP) och grafdatabaser är det nu möjligt att systematiskt extrahera och lagra strukturerad information från ostrukturerade datakällor. Den nuvarande metoden för att effektivt extrahera information använder övervakade maskininlärningsmodeller som kräver en stor mängd märkta träningsdata. Datamärkningsprocessen är vanligtvis tidskrävande och svår att få i ett domänspecifikt område. Detta projekt utforskar ett tillvägagångssätt för att konstruera en företagsdomänspecifikt ”Knowledge Graph” (KG) som innehåller företagsrelaterade enheter och relationer från SEC 10-K-arkivering genom att kombinera en i förväg tränad allmän NLP med regelbaserade mönster i ”Named Entity Recognition” (NER) och ”Relation Extraction” (RE). Detta tillvägagångssätt eliminerar den tidskrävande datamärkningsuppgiften i det statistiska tillvägagångssättet och genom att utvärdera tio SEC 10-K arkiv har modellen den totala återkallelsen på 53,6 %, precision på 75,7 % och F1-poängen på 62,8 %. Resultatet visar att det är möjligt att extrahera företagsinformation med hybridmetoderna, vilket inte kräver en stor mängd märkta träningsdata. Projektet kräver dock en tidskrävande process för att hitta lexikala mönster från meningar för att extrahera företagsrelaterade enheter och relationer. Companies do not exist in isolation. They are embedded in structural relationships with each other. Mapping a given company’s relationships with other companies in terms of competitors, subsidiaries, suppliers, and customers are key to understanding a company’s major risk factors and opportunities. Conventionally, obtaining and staying up to date with this key knowledge was achieved by reading financial news and reports by highly skilled manual labor like a financial analyst. However, with the development of Natural Language Processing (NLP) and graph databases, it is now possible to systematically extract and store structured information from unstructured data sources. The current go-to method to effectively extract information uses supervised machine learning models, which require a large amount of labeled training data. The data labeling process is usually time-consuming and hard to get in a domain-specific area. This project explores an approach to construct a company domain-specific Knowledge Graph (KG) that contains company-related entities and relationships from the U.S. Securities and Exchange Commission (SEC) 10-K filings by combining a pre-trained general NLP with rule-based patterns in Named Entity Recognition (NER) and Relation Extraction (RE). This approach eliminates the time-consuming data-labeling task in the statistical approach, and by evaluating ten 10-k filings, the model has the overall Recall of 53.6%, Precision of 75.7%, and the F1-score of 62.8%. The result shows it is possible to extract company information using the hybrid methods, which does not require a large amount of labeled training data. However, the project requires the time-consuming process of finding lexical patterns from sentences to extract company-related entities and relationships.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______681::65088c9dcf50bb4de2e221bcdea69374&type=result"></script>');
-->
</script>
Green | |
bronze |
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______681::65088c9dcf50bb4de2e221bcdea69374&type=result"></script>');
-->
</script>
School teachers spend approximately 30 percent of their time grading exams and other assessments. With an increasingly digitized education, a research field have been initiated that aims to reduce the time spent on grading by automating it. This is an easy task for multiple-choice questions but much harder for open-ended questions requiring free-text answers, where the latter have shown to be superior for knowledge assessment and learning consolidation. While results in previous work have presented promising results of up to 90 percent grading accuracy, it is still problematic using a system that gives the wrong grade in 10 percent of the cases. This has given rise to a research field focusing on assisting teachers in the grading process, instead of fully replacing them. Cluster analysis has been the most popular tool for this, grouping similar answers together and letting teachers process groups of answers at once, instead of evaluating each question one-at-a-time. This approach has shown evidence to decrease the time spent on grading substantially, however, the methods for performing the clustering vary widely between studies, leaving no apparent methodology choice for real-use implementation. Using several techniques for pre-processing, text representation and choice of clustering algorithm, this work compared various methods for clustering free-text answers by evaluating them on a dataset containing almost 400 000 student answers. The results showed that using all of the tested pre-processing techniques led to the best performance, although the difference to using minimum pre-processing were small. Sentence embeddings were the text representation approach that performed the best, however, it remains to be answered how it should be used when spelling and grammar is part of the assessment, as it lacks the ability to identify such errors. A suitable choice of clustering algorithm is one where the number of clusters can be specified, as determining this automatically proved to be difficult. Teachers can then easily adjust the number of clusters based on their judgement. Skollärare spenderar ungefär 30 procent av sin tid på rättning av prov och andra bedömningar. I takt med att mer utbildning digitaliseras, försöker forskare hitta sätt att automatisera rättning för att minska den administrativa bördan för lärare. Flervalsfrågor har fördelen att de enkelt kan rättas automatiskt, medan öppet ställda frågor som kräver ett fritt formulerat svar har visat sig vara ett bättre verktyg för att mäta elevers förståelse. Dessa typer av frågor är däremot betydligt svårare att rätta automatiskt, vilket lett till forskning inom automatisk rättning av dessa. Även om tidigare forskning har lyckats uppnå resultat med upp till 90 procents träffsäkerhet, är det fortfarande problematiskt att det blir fel i de resterande 10 procenten av fallen. Detta har lett till forskning som fokuserar på underlätta för lärare i rättningen, istället för att ersätta dem. Klusteranalys har varit det mest populära tillvägagångssättet för att åstadkomma detta, där liknande svar grupperas tillsammans, vilket möjliggör rättning av flera svar samtidigt. Denna metod har visat sig minska rättningstiden signifikant, däremot har metoderna för att göra klusteranalysen varierat brett, vilket gör det svårt att veta hur en implementering i ett verkligt scenario bör se ut. Genom att använda olika tekniker för textbearbetning, textrepresentation och val av klusteralgoritm, jämför detta arbete olika metoder för att klustra fritext-svar, genom att utvärdera dessa på nästan 400 000 riktiga elevsvar. Resultatet visar att mer textbearbetning generellt är bättre, även om skillnaderna är små. Användning av så kallade sentence embeddings ledde till bäst resultat när olika tekniker för textrepresentation jämfördes. Däremot har denna teknik svårare att identifiera grammatik- och stavningsfel, hur detta ska hanteras är en fråga för framtida forskning. Ett lämpligt val av klustringsalgoritm är en där antalet kluster kan bestämmas av användaren, då det visat sig svårt att bestämma det automatiskt. Lärare kan då justera antalet kluster ifall det skulle vara för få eller för många.