Advanced search in
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
Include:
137 Research products, page 1 of 14

  • Publications
  • 2017-2021
  • Publikationer från KTH
  • Digital Humanities and Cultural Heritage

10
arrow_drop_down
Date (most recent)
arrow_drop_down
  • Open Access Russian
    Authors: 
    Arzyutov, Dmitry V.; Anderson, David G.;
    Publisher: KTH, Historiska studier av teknik, vetenskap och miljö
    Country: Sweden

    What does an anthropologist’s archive look like? Where is it located? And is the anthropology of archives important for the understanding of anthropological thinking today? Here we answer these questions by analysing the various life histories of the archival fragments of one of the most puzzling and influential anthropologists in the history of Russian and Soviet anthropology: Sergei Mikhailovich Shirokogoroff (1887–1939). Shirokogoroff is credited as being one of the authors of the etnos theory — one of the main instruments of identity politics in Russia, China, Germany and also, in part, Japan and South Africa. The transnational life histories of Shirokogoroff and his wife Elizaveta [Elizabeth] Nikolaevna (1884–1943), and of their ideas, suggests a conception of the archive not as a single whole, but instead as a collection of forgotten, hidden, obliterated, or, on the other hand, scrupulously preserved fragments. These fragments are not centred in one place or organized around any one reading, but they nevertheless represent “partial connections”. Moreover, as we can see today with hindsight, none of these archival fragments lay inert. They have been intertwined in local political and social ontologies. Our text has an autoethnograpic quality. While illustrating separate episodes from the life of the Shirokogoroffs we also will tell of our search for the manuscripts through which we were forced onto strange paths and encounters. These greatly deepened our understanding both of the life of documents and their material links to the lives of researchers. Our article is an attempt to illustrate this complex picture which, in the end, will allow us to conclude that we have only just begun to understand the workings of the anthropologist’s archive in the history of anthropological thought. QC 20220530

  • Open Access English
    Authors: 
    Daniel Svensson; Sverker Sörlin; Katarina Saltzman;
    Publisher: KTH, Historiska studier av teknik, vetenskap och miljö
    Country: Sweden

    Can walking trails be understood not only as routes to history and heritage, but also as heritage in and of themselves? The paper explores the articulation of trails as a distinct landscape and mobility heritage, bridging the nature-culture divide and building on physical and intellectual movements over time. The authors aim to contribute to a better understanding of the geography of trails and trailscapes by analysing the emergence of the Swedish-Norwegian trail Finnskogleden. The trail is situated in the border region spanning the former county of Hedmark in present-day Innlandet County, south-eastern Norway, and Värmland County in mid-western Sweden, a forested area where Finnish-speaking immigrants settled from the 16th century to the early 20th century. Archives, literature, interviews, and field visits were used to analyse the emergence and governance of the trail. The main finding is the importance of continuous articulation work by local and regional stakeholders, through texts, maps, maintenance, and mobility. In conclusion, the Finn forest trailscape and its mobility heritage can be seen as an articulation of territory over time, a multilayered process drawing on various environing technologies, making the trail a transformative part of a trans-border political geography. Rörelsearvet: stigar och leder i hållbar och inkluderande kulturarvsförvaltning

  • Publication . Master thesis . Bachelor thesis . 2021
    Open Access
    Authors: 
    González Lopez, Angel Luis;
    Publisher: E.T.S. de Ingenieros Informáticos (UPM)
    Countries: Spain, Sweden

    Code Search is one of the most common tasks for developers. The open-source software movement and the rise of social media have made this process easier thanks to the vast public software repositories available to everyone and the Q&A sites where individuals can resolve their doubts. However, in the case of poorly documented code that is difficult to search in a repository, or in the case of private enterprise frameworks that are not publicly available, so there is not a community on Q&A sites to answer questions, searching for code snippets to solve doubts or learn how to use an API becomes very complicated. In order to solve this problem, this thesis studies the use of natural language in code retrieval. In particular, it studies transformer-based models, such as Bidirectional Encoder Representations from Transformers (BERT), which are currently state of the art in natural language processing but present high latency in information retrieval tasks. That is why this project proposes a multi-stage architecture that seeks to maintain the performance of standard BERT-based models while reducing the high latency usually associated with the use of this type of framework. Experiments show that this architecture outperforms previous non- BERT-based models by +0.17 on the Top 1 (or Recall@1) metric and reduces latency with inference times 5% of those of standard BERT models. Kodsökning är en av de vanligaste uppgifterna för utvecklare. Rörelsen för öppen källkod och de sociala medierna har gjort denna process enklare tack vare de stora offentliga programvaruupplagorna som är tillgängliga för alla och de Q&A-webbplatser där enskilda personer kan lösa sina tvivel. När det gäller dåligt dokumenterad kod som är svår att söka i ett arkiv, eller när det gäller ramverk för privata företag som inte är offentligt tillgängliga, så att det inte finns någon gemenskap på Q&AA-webbplatser för att besvara frågor, blir det dock mycket komplicerat att söka efter kodstycken för att lösa tvivel eller lära sig hur man använder ett API. För att lösa detta problem studeras i denna avhandling användningen av naturligt språk för att hitta kod. I synnerhet studeras transformatorbaserade modeller, såsom BERT, som för närvarande är den senaste tekniken inom behandling av naturliga språk men som har hög latenstid vid informationssökning. Därför föreslås i detta projekt en arkitektur i flera steg som syftar till att bibehålla prestandan hos standard BERT-baserade modeller samtidigt som den höga latenstiden som vanligtvis är förknippad med användningen av denna typ av ramverk minskas. Experiment visar att denna arkitektur överträffar tidigare icke-BERT-baserade modeller med +0,17 på Top 1 (eller Recall@1) och minskar latensen, med en inferenstid som är 5% av den för standard BERT-modeller.

  • Publication . Conference object . 2021
    Open Access English
    Authors: 
    Alkathiri, Abdul Aziz; Giaretta, Lodovico; Girdzijauskas, Sarunas; Sahlgren, Magnus;
    Publisher: Zenodo
    Country: Sweden
    Project: EC | RAIS (813162)

    Advanced NLP models require huge amounts of data from various domains to produce high-quality representations. It is useful then for a few large public and private organizations to join their corpora during training. However, factors such as legislation and user emphasis on data privacy may prevent centralized orchestration and data sharing among these organizations. Therefore, for this specific scenario, we investigate how gossip learning, a massively-parallel, data-private, decentralized protocol, compares to a shared-dataset solution. We find that the application of Word2Vec in a gossip learning framework is viable. Without any tuning, the results are comparable to a traditional centralized setting, with a reduction in ground-truth similarity scores as low as 4.3%. Furthermore, the results are up to 54.8% better than independent local training. QC 20210423

  • Open Access English
    Authors: 
    Chen Feng; John Peponis;
    Publisher: KTH, Arkitektur
    Country: Sweden

    The patterns of syntactic differentiation and their causes and effects are fundamental to space syntax analysis. Often, however, differentiation is taken for granted with no reference to the dynamic process that brings it about. Here, we first show that by measuring the amount of syntactic differentiation, we can better distinguish between types of street networks. We then show that repeated local transformations of a regular street grid lead to different yet largely predictable trajectories of differentiation depending upon the rules used. Finally, we show that different paths to differentiation entail different costs in terms of undesirable properties. This allows us to better assess the likely consequences of design moves and their appropriateness relative to design intentions. QC 20210614

  • Open Access English
    Authors: 
    Viktor Palmkvist; Elias Castegren; Philipp Haller; David Broman;
    Publisher: KTH, Programvaruteknik och datorsystem, SCS
    Country: Sweden

    When building a new programming language, it can be useful to compose parts of existing languages to avoid repeating implementation work. However, this is problematic already at the syntax level, as composing the grammars of language fragments can easily lead to an ambiguous grammar. State-of-the-art parser tools cannot handle ambiguity truly well: either the grammar cannot be handled at all, or the tools give little help to an end-user who writes an ambiguous program. This composability problem is twofold: (i) how can we detect if the composed grammar is ambiguous, and (ii) if it is ambiguous, how can we help a user resolve an ambiguous program? In this paper, we depart from the traditional view of unambiguous grammar design and enable a language designer to work with an ambiguous grammar, while giving users the tools needed to handle these ambiguities. We introduce the concept of resolvable ambiguity wherein a user can resolve an ambiguous program by editing it, as well as an approach to computing the resolutions of an ambiguous program. Furthermore, we present a method based on property-based testing to identify if a composed grammar is unambiguous, resolvably ambiguous, or unresolvably ambiguous. The method is implemented in Haskell and evaluated on a large set of language fragments selected from different languages. The evaluation shows that (i) the approach can handle significantly more cases of language compositions compared to approaches which ban ambiguity altogether, and (ii) that the approach is fast enough to be used in practice. QC 20210520

  • Open Access English
    Authors: 
    Bubla, Boris;
    Publisher: KTH, Skolan för elektroteknik och datavetenskap (EECS)
    Country: Sweden

    The recent development of massive multilingual transformer networks has resulted in drastic improvements in model performance. These models, however, are so large they suffer from large inference latency and consume vast computing resources. Such features hinder widespread adoption of the models in industry and some academic settings. Thus there is growing research into reducing their parameter count and increasing their inference speed, with significant interest in the use of knowledge distillation techniques. This thesis uses the existing approach of deep self-attention distillation to develop a task-agnostic distillation of the language agnostic BERT sentence embedding model. It also explores the use of the Switch Transformer architecture in distillation contexts. The result is DistilLaBSE, a task-agnostic distillation of LaBSE used to create a 10 times faster version of LaBSE, whilst retaining over 99% cosine similarity of its sentence embeddings on a holdout test from the same domain as the training samples, namely the OpenSubtitles dataset. It is also shown that DistilLaBSE achieves similar scores when embedding data from two other domains, namely English tweets and customer support banking data. This faster version of LaBSE allows industry practitioners and resourcelimited academic groups to apply a more convenient version of LaBSE to their various applications and research tasks. Den senaste utvecklingen av massiva flerspråkiga transformatornätverk har resulterat i drastiska förbättringar av modellprestanda. Dessa modeller är emellertid så stora att de lider av stor inferenslatens och förbrukar stora datorresurser. Sådana funktioner hindrar bred spridning av modeller i branschen och vissa akademiska miljöer. Således växer det forskning om att minska deras parametrar och öka deras inferenshastighet, med stort intresse för användningen av kunskapsdestillationstekniker. Denna avhandling använder det befintliga tillvägagångssättet med djup uppmärksamhetsdestillation för att utveckla en uppgiftsagnostisk destillation av språket agnostisk BERT- innebördmodell. Den utforskar också användningen av Switch Transformerarkitekturen i destillationskontexter. Resultatet är DistilLaBSE, en uppgiftsagnostisk destillation av LaBSE som används för att skapa en 10x snabbare version av LaBSE, samtidigt som man bibehåller mer än 99 % cosinuslikhet i sina meningsinbäddningar på ett uthållstest från samma domän som träningsproverna, nämligen OpenSubtitles dataset. Det visas också att DistilLaBSE uppnår liknande poäng när man bäddar in data från två andra domäner, nämligen engelska tweets och kundsupportbankdata. Denna snabbare version av LaBSE tillåter branschutövare och resursbegränsade akademiska grupper

  • Open Access English
    Authors: 
    Sverker Sörlin;
    Publisher: KTH, Historiska studier av teknik, vetenskap och miljö
    Country: Sweden
    Project: EC | SPHERE (787516)

    AbstractEmerging after World War II “the environment” as a modern concept turned in the years around 1970 into a phase of institutionalization in science, civic society, and politics. Part of this was the foundation of journals. The majority became “environmental specialist journals”, typically based in established disciplines. Some became “environmental generalist journals”, covering broad knowledge areas and often with an ambition to be policy relevant. A significant and early member of the latter category was Ambio, founded 1972. This article presents an overview of the journal’s first 50 years, with a focus on main changes in scientific content, political context, and editorial directions. A key finding is that the journal reflects an increasing pluralization of “the environment” with concepts such as global change, climate change, Earth system science, Anthropocene, resilience, and environmental governance. Another finding is that the journal has also itself influenced developments through publishing work on new concepts and ideas.

  • Open Access English
    Authors: 
    Lazarova, Mariya;
    Publisher: KTH, Skolan för elektroteknik och datavetenskap (EECS)
    Country: Sweden

    Nowadays, with the ever growing availability of options in many areas of our lives, it is crucial to have good ways to navigate your choices. This is why recommendation engines’ role is growing more important. Recommenders are often based on user-item interaction. In many areas like news and podcasts, however, by the time there is enough interaction data for an item, the item has already become irrelevant. This is why incorporating content features is desirable, as the content does not depend on the popularity or novelty of an item. Very often, there is text describing an item, so text features are good candidates for features within recommender systems. Within Natural Language Processing (NLP), pre-trained language models based on the Transformer architecture have brought a revolution in recent years, achieving state-of-the-art performance on many language tasks. Because of this, it is natural to explore how such models can play a role within recommendation systems. The scope of this work is on the intersection between NLP and recommendation systems where we investigate what are the effects of adding BERT-based encodings of titles and descriptions of movies and books to a recommender system. The results show that even in off-the-shelf BERT-models there is a considerable amount of information on movie and book similarity. It also shows that BERT based representations could be used in a recommender system for user recommendation to combine the best of collaborative and content representations. In this thesis, it is shown that adding deep pre-trained language model representations could improve a recommender system’s capability to predict good items for users with up to 0.43 AUC-ROC score for a shallow model, and 0.017 AUC-ROC score for a deeper model. It is also shown that SBERT can be fine-tuned to encode item similarity with up to 0.03 nDCG and up to 0.05 nDCG@10 score improvement. Med den ständigt växande tillgängligheten av val i många delar av våra liv har det blivit viktigt att enkelt kunna navigera kring olika alternativ. Det är därför rekommendationssystems har blivit viktigare. Rekommendationssystem baseras ofta på interaktion-historiken mellan användare och artikel. När tillräckligt mycket data inom nyheter och podcast har hunnits samlats in för att utföra en rekommendation så har artikeln hunnit bli irrelevant. Det är därför det är önskvärt att införa innehållsfunktioner till rekommenderaren, då innehållet inte är beroende av popularitet eller nymodigheten av artikeln. Väldigt ofta finns det text som beskriver en artikel vilket har lett till textfunktioner blivit bra kandidater som funktion för rekommendationssystem. Inom Naturlig Språkbehandling (NLP), har förtränande språkmodeller baserad på transformator arkitekturen revolutionerat området de senaste åren. Den nya arkitekturen har uppnått toppmoderna resultat på flertal språkuppgifter. Tack vare detta, har det blivit naturligt att utforska hur sådana modeller kan fungera inom rekommendationssystem. Det här arbetet är mellan två områden, NLP och rekommendationssystem. Arbetet utforskar effekten av att lägga till BERT-baserade kodningar av titel och beskrivning av filmer, samt böcker till ett rekommendationssystem. Resultaten visar att även i förpackade BERT modeller finns det mycket av information om likheter mellan film och böcker. Resultaten visar även att BERT representationer kan användas i rekommendationssystem för användarrekommendationer, i kombination med kollaborativa och artikel baserade representationer. Uppsatsen visar att lägga till förtränade djupspråkmodell representationer kan förbättra rekommendationssystemens förmåga att förutsäga bra artiklar för användare. Förbättringarna är upp till 0.43 AUC-ROC poäng för en grundmodell, samt 0.017 AUC-ROC poäng för en djupmodell. Uppsatsen visar även att SBERT kan bli finjusterad för att koda artikel likhet med upp till 0.03 nDCG och upp till 0.05 nDCG@10 poängs förbättring.

  • Open Access English
    Authors: 
    Bereczki, Márk;
    Publisher: KTH, Skolan för elektroteknik och datavetenskap (EECS)
    Country: Sweden

    Recommender systems are widely used in websites and applications to help users find relevant content based on their interests. Graph neural networks achieved state- of-the- art results in the field of recommender systems, working on data represented in the form of a graph. However, most graph- based solutions hold challenges regarding computational complexity or the ability to generalize to new users. Therefore, we propose a novel graph- based recommender system, by modifying Simple Graph Convolution, an approach for efficient graph node classification, and add the capability of generalizing to new users. We build our proposed recommender system for recommending the articles of Peltarion Knowledge Center. By incorporating two data sources, implicit user feedback based on pageview data as well as the content of articles, we propose a hybrid recommender solution. Throughout our experiments, we compare our proposed solution with a matrix factorization approach as well as a popularity- based and a random baseline, analyse the hyperparameters of our model, and examine the capability of our solution to give recommendations to new users who were not part of the training data set. Our model results in slightly lower, but similar Mean Average Precision and Mean Reciprocal Rank scores to the matrix factorization approach, and outperforms the popularity- based and random baselines. The main advantages of our model are computational efficiency and its ability to give relevant recommendations to new users without the need for retraining the model, which are key features for real- world use cases. Rekommendationssystem används ofta på webbplatser och applikationer för att hjälpa användare att hitta relevant innehåll baserad på deras intressen. Med utvecklingen av grafneurala nätverk nådde toppmoderna resultat inom rekommendationssystem och representerade data i form av en graf. De flesta grafbaserade lösningar har dock svårt med beräkningskomplexitet eller att generalisera till nya användare. Därför föreslår vi ett nytt grafbaserat rekommendatorsystem genom att modifiera Simple Graph Convolution. De här tillvägagångssätt är en effektiv grafnodsklassificering och lägga till möjligheten att generalisera till nya användare. Vi bygger vårt föreslagna rekommendatorsystem för att rekommendera artiklarna från Peltarion Knowledge Center. Genom att integrera två datakällor, implicit användaråterkoppling baserad på sidvisningsdata samt innehållet i artiklar, föreslår vi en hybridrekommendatörslösning. Under våra experiment jämför vi vår föreslagna lösning med en matrisfaktoriseringsmetod samt en popularitetsbaserad och en slumpmässig baslinje, analyserar hyperparametrarna i vår modell och undersöker förmågan hos vår lösning att ge rekommendationer till nya användare som inte deltog av träningsdatamängden. Vår modell resulterar i något mindre men liknande Mean Average Precision och Mean Reciprocal Rank poäng till matrisfaktoriseringsmetoden och överträffar de popularitetsbaserade och slumpmässiga baslinjerna. De viktigaste fördelarna med vår modell är beräkningseffektivitet och dess förmåga att ge relevanta rekommendationer till nya användare utan behov av omskolning av modellen, vilket är nyckelfunktioner för verkliga användningsfall.

Advanced search in
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
Include:
137 Research products, page 1 of 14
  • Open Access Russian
    Authors: 
    Arzyutov, Dmitry V.; Anderson, David G.;
    Publisher: KTH, Historiska studier av teknik, vetenskap och miljö
    Country: Sweden

    What does an anthropologist’s archive look like? Where is it located? And is the anthropology of archives important for the understanding of anthropological thinking today? Here we answer these questions by analysing the various life histories of the archival fragments of one of the most puzzling and influential anthropologists in the history of Russian and Soviet anthropology: Sergei Mikhailovich Shirokogoroff (1887–1939). Shirokogoroff is credited as being one of the authors of the etnos theory — one of the main instruments of identity politics in Russia, China, Germany and also, in part, Japan and South Africa. The transnational life histories of Shirokogoroff and his wife Elizaveta [Elizabeth] Nikolaevna (1884–1943), and of their ideas, suggests a conception of the archive not as a single whole, but instead as a collection of forgotten, hidden, obliterated, or, on the other hand, scrupulously preserved fragments. These fragments are not centred in one place or organized around any one reading, but they nevertheless represent “partial connections”. Moreover, as we can see today with hindsight, none of these archival fragments lay inert. They have been intertwined in local political and social ontologies. Our text has an autoethnograpic quality. While illustrating separate episodes from the life of the Shirokogoroffs we also will tell of our search for the manuscripts through which we were forced onto strange paths and encounters. These greatly deepened our understanding both of the life of documents and their material links to the lives of researchers. Our article is an attempt to illustrate this complex picture which, in the end, will allow us to conclude that we have only just begun to understand the workings of the anthropologist’s archive in the history of anthropological thought. QC 20220530

  • Open Access English
    Authors: 
    Daniel Svensson; Sverker Sörlin; Katarina Saltzman;
    Publisher: KTH, Historiska studier av teknik, vetenskap och miljö
    Country: Sweden

    Can walking trails be understood not only as routes to history and heritage, but also as heritage in and of themselves? The paper explores the articulation of trails as a distinct landscape and mobility heritage, bridging the nature-culture divide and building on physical and intellectual movements over time. The authors aim to contribute to a better understanding of the geography of trails and trailscapes by analysing the emergence of the Swedish-Norwegian trail Finnskogleden. The trail is situated in the border region spanning the former county of Hedmark in present-day Innlandet County, south-eastern Norway, and Värmland County in mid-western Sweden, a forested area where Finnish-speaking immigrants settled from the 16th century to the early 20th century. Archives, literature, interviews, and field visits were used to analyse the emergence and governance of the trail. The main finding is the importance of continuous articulation work by local and regional stakeholders, through texts, maps, maintenance, and mobility. In conclusion, the Finn forest trailscape and its mobility heritage can be seen as an articulation of territory over time, a multilayered process drawing on various environing technologies, making the trail a transformative part of a trans-border political geography. Rörelsearvet: stigar och leder i hållbar och inkluderande kulturarvsförvaltning

  • Publication . Master thesis . Bachelor thesis . 2021
    Open Access
    Authors: 
    González Lopez, Angel Luis;
    Publisher: E.T.S. de Ingenieros Informáticos (UPM)
    Countries: Spain, Sweden

    Code Search is one of the most common tasks for developers. The open-source software movement and the rise of social media have made this process easier thanks to the vast public software repositories available to everyone and the Q&A sites where individuals can resolve their doubts. However, in the case of poorly documented code that is difficult to search in a repository, or in the case of private enterprise frameworks that are not publicly available, so there is not a community on Q&A sites to answer questions, searching for code snippets to solve doubts or learn how to use an API becomes very complicated. In order to solve this problem, this thesis studies the use of natural language in code retrieval. In particular, it studies transformer-based models, such as Bidirectional Encoder Representations from Transformers (BERT), which are currently state of the art in natural language processing but present high latency in information retrieval tasks. That is why this project proposes a multi-stage architecture that seeks to maintain the performance of standard BERT-based models while reducing the high latency usually associated with the use of this type of framework. Experiments show that this architecture outperforms previous non- BERT-based models by +0.17 on the Top 1 (or Recall@1) metric and reduces latency with inference times 5% of those of standard BERT models. Kodsökning är en av de vanligaste uppgifterna för utvecklare. Rörelsen för öppen källkod och de sociala medierna har gjort denna process enklare tack vare de stora offentliga programvaruupplagorna som är tillgängliga för alla och de Q&A-webbplatser där enskilda personer kan lösa sina tvivel. När det gäller dåligt dokumenterad kod som är svår att söka i ett arkiv, eller när det gäller ramverk för privata företag som inte är offentligt tillgängliga, så att det inte finns någon gemenskap på Q&AA-webbplatser för att besvara frågor, blir det dock mycket komplicerat att söka efter kodstycken för att lösa tvivel eller lära sig hur man använder ett API. För att lösa detta problem studeras i denna avhandling användningen av naturligt språk för att hitta kod. I synnerhet studeras transformatorbaserade modeller, såsom BERT, som för närvarande är den senaste tekniken inom behandling av naturliga språk men som har hög latenstid vid informationssökning. Därför föreslås i detta projekt en arkitektur i flera steg som syftar till att bibehålla prestandan hos standard BERT-baserade modeller samtidigt som den höga latenstiden som vanligtvis är förknippad med användningen av denna typ av ramverk minskas. Experiment visar att denna arkitektur överträffar tidigare icke-BERT-baserade modeller med +0,17 på Top 1 (eller Recall@1) och minskar latensen, med en inferenstid som är 5% av den för standard BERT-modeller.

  • Publication . Conference object . 2021
    Open Access English
    Authors: 
    Alkathiri, Abdul Aziz; Giaretta, Lodovico; Girdzijauskas, Sarunas; Sahlgren, Magnus;
    Publisher: Zenodo
    Country: Sweden
    Project: EC | RAIS (813162)

    Advanced NLP models require huge amounts of data from various domains to produce high-quality representations. It is useful then for a few large public and private organizations to join their corpora during training. However, factors such as legislation and user emphasis on data privacy may prevent centralized orchestration and data sharing among these organizations. Therefore, for this specific scenario, we investigate how gossip learning, a massively-parallel, data-private, decentralized protocol, compares to a shared-dataset solution. We find that the application of Word2Vec in a gossip learning framework is viable. Without any tuning, the results are comparable to a traditional centralized setting, with a reduction in ground-truth similarity scores as low as 4.3%. Furthermore, the results are up to 54.8% better than independent local training. QC 20210423

  • Open Access English
    Authors: 
    Chen Feng; John Peponis;
    Publisher: KTH, Arkitektur
    Country: Sweden

    The patterns of syntactic differentiation and their causes and effects are fundamental to space syntax analysis. Often, however, differentiation is taken for granted with no reference to the dynamic process that brings it about. Here, we first show that by measuring the amount of syntactic differentiation, we can better distinguish between types of street networks. We then show that repeated local transformations of a regular street grid lead to different yet largely predictable trajectories of differentiation depending upon the rules used. Finally, we show that different paths to differentiation entail different costs in terms of undesirable properties. This allows us to better assess the likely consequences of design moves and their appropriateness relative to design intentions. QC 20210614

  • Open Access English
    Authors: 
    Viktor Palmkvist; Elias Castegren; Philipp Haller; David Broman;
    Publisher: KTH, Programvaruteknik och datorsystem, SCS
    Country: Sweden

    When building a new programming language, it can be useful to compose parts of existing languages to avoid repeating implementation work. However, this is problematic already at the syntax level, as composing the grammars of language fragments can easily lead to an ambiguous grammar. State-of-the-art parser tools cannot handle ambiguity truly well: either the grammar cannot be handled at all, or the tools give little help to an end-user who writes an ambiguous program. This composability problem is twofold: (i) how can we detect if the composed grammar is ambiguous, and (ii) if it is ambiguous, how can we help a user resolve an ambiguous program? In this paper, we depart from the traditional view of unambiguous grammar design and enable a language designer to work with an ambiguous grammar, while giving users the tools needed to handle these ambiguities. We introduce the concept of resolvable ambiguity wherein a user can resolve an ambiguous program by editing it, as well as an approach to computing the resolutions of an ambiguous program. Furthermore, we present a method based on property-based testing to identify if a composed grammar is unambiguous, resolvably ambiguous, or unresolvably ambiguous. The method is implemented in Haskell and evaluated on a large set of language fragments selected from different languages. The evaluation shows that (i) the approach can handle significantly more cases of language compositions compared to approaches which ban ambiguity altogether, and (ii) that the approach is fast enough to be used in practice. QC 20210520

  • Open Access English
    Authors: 
    Bubla, Boris;
    Publisher: KTH, Skolan för elektroteknik och datavetenskap (EECS)
    Country: Sweden

    The recent development of massive multilingual transformer networks has resulted in drastic improvements in model performance. These models, however, are so large they suffer from large inference latency and consume vast computing resources. Such features hinder widespread adoption of the models in industry and some academic settings. Thus there is growing research into reducing their parameter count and increasing their inference speed, with significant interest in the use of knowledge distillation techniques. This thesis uses the existing approach of deep self-attention distillation to develop a task-agnostic distillation of the language agnostic BERT sentence embedding model. It also explores the use of the Switch Transformer architecture in distillation contexts. The result is DistilLaBSE, a task-agnostic distillation of LaBSE used to create a 10 times faster version of LaBSE, whilst retaining over 99% cosine similarity of its sentence embeddings on a holdout test from the same domain as the training samples, namely the OpenSubtitles dataset. It is also shown that DistilLaBSE achieves similar scores when embedding data from two other domains, namely English tweets and customer support banking data. This faster version of LaBSE allows industry practitioners and resourcelimited academic groups to apply a more convenient version of LaBSE to their various applications and research tasks. Den senaste utvecklingen av massiva flerspråkiga transformatornätverk har resulterat i drastiska förbättringar av modellprestanda. Dessa modeller är emellertid så stora att de lider av stor inferenslatens och förbrukar stora datorresurser. Sådana funktioner hindrar bred spridning av modeller i branschen och vissa akademiska miljöer. Således växer det forskning om att minska deras parametrar och öka deras inferenshastighet, med stort intresse för användningen av kunskapsdestillationstekniker. Denna avhandling använder det befintliga tillvägagångssättet med djup uppmärksamhetsdestillation för att utveckla en uppgiftsagnostisk destillation av språket agnostisk BERT- innebördmodell. Den utforskar också användningen av Switch Transformerarkitekturen i destillationskontexter. Resultatet är DistilLaBSE, en uppgiftsagnostisk destillation av LaBSE som används för att skapa en 10x snabbare version av LaBSE, samtidigt som man bibehåller mer än 99 % cosinuslikhet i sina meningsinbäddningar på ett uthållstest från samma domän som träningsproverna, nämligen OpenSubtitles dataset. Det visas också att DistilLaBSE uppnår liknande poäng när man bäddar in data från två andra domäner, nämligen engelska tweets och kundsupportbankdata. Denna snabbare version av LaBSE tillåter branschutövare och resursbegränsade akademiska grupper

  • Open Access English
    Authors: 
    Sverker Sörlin;
    Publisher: KTH, Historiska studier av teknik, vetenskap och miljö
    Country: Sweden
    Project: EC | SPHERE (787516)

    AbstractEmerging after World War II “the environment” as a modern concept turned in the years around 1970 into a phase of institutionalization in science, civic society, and politics. Part of this was the foundation of journals. The majority became “environmental specialist journals”, typically based in established disciplines. Some became “environmental generalist journals”, covering broad knowledge areas and often with an ambition to be policy relevant. A significant and early member of the latter category was Ambio, founded 1972. This article presents an overview of the journal’s first 50 years, with a focus on main changes in scientific content, political context, and editorial directions. A key finding is that the journal reflects an increasing pluralization of “the environment” with concepts such as global change, climate change, Earth system science, Anthropocene, resilience, and environmental governance. Another finding is that the journal has also itself influenced developments through publishing work on new concepts and ideas.

  • Open Access English
    Authors: 
    Lazarova, Mariya;
    Publisher: KTH, Skolan för elektroteknik och datavetenskap (EECS)
    Country: Sweden

    Nowadays, with the ever growing availability of options in many areas of our lives, it is crucial to have good ways to navigate your choices. This is why recommendation engines’ role is growing more important. Recommenders are often based on user-item interaction. In many areas like news and podcasts, however, by the time there is enough interaction data for an item, the item has already become irrelevant. This is why incorporating content features is desirable, as the content does not depend on the popularity or novelty of an item. Very often, there is text describing an item, so text features are good candidates for features within recommender systems. Within Natural Language Processing (NLP), pre-trained language models based on the Transformer architecture have brought a revolution in recent years, achieving state-of-the-art performance on many language tasks. Because of this, it is natural to explore how such models can play a role within recommendation systems. The scope of this work is on the intersection between NLP and recommendation systems where we investigate what are the effects of adding BERT-based encodings of titles and descriptions of movies and books to a recommender system. The results show that even in off-the-shelf BERT-models there is a considerable amount of information on movie and book similarity. It also shows that BERT based representations could be used in a recommender system for user recommendation to combine the best of collaborative and content representations. In this thesis, it is shown that adding deep pre-trained language model representations could improve a recommender system’s capability to predict good items for users with up to 0.43 AUC-ROC score for a shallow model, and 0.017 AUC-ROC score for a deeper model. It is also shown that SBERT can be fine-tuned to encode item similarity with up to 0.03 nDCG and up to 0.05 nDCG@10 score improvement. Med den ständigt växande tillgängligheten av val i många delar av våra liv har det blivit viktigt att enkelt kunna navigera kring olika alternativ. Det är därför rekommendationssystems har blivit viktigare. Rekommendationssystem baseras ofta på interaktion-historiken mellan användare och artikel. När tillräckligt mycket data inom nyheter och podcast har hunnits samlats in för att utföra en rekommendation så har artikeln hunnit bli irrelevant. Det är därför det är önskvärt att införa innehållsfunktioner till rekommenderaren, då innehållet inte är beroende av popularitet eller nymodigheten av artikeln. Väldigt ofta finns det text som beskriver en artikel vilket har lett till textfunktioner blivit bra kandidater som funktion för rekommendationssystem. Inom Naturlig Språkbehandling (NLP), har förtränande språkmodeller baserad på transformator arkitekturen revolutionerat området de senaste åren. Den nya arkitekturen har uppnått toppmoderna resultat på flertal språkuppgifter. Tack vare detta, har det blivit naturligt att utforska hur sådana modeller kan fungera inom rekommendationssystem. Det här arbetet är mellan två områden, NLP och rekommendationssystem. Arbetet utforskar effekten av att lägga till BERT-baserade kodningar av titel och beskrivning av filmer, samt böcker till ett rekommendationssystem. Resultaten visar att även i förpackade BERT modeller finns det mycket av information om likheter mellan film och böcker. Resultaten visar även att BERT representationer kan användas i rekommendationssystem för användarrekommendationer, i kombination med kollaborativa och artikel baserade representationer. Uppsatsen visar att lägga till förtränade djupspråkmodell representationer kan förbättra rekommendationssystemens förmåga att förutsäga bra artiklar för användare. Förbättringarna är upp till 0.43 AUC-ROC poäng för en grundmodell, samt 0.017 AUC-ROC poäng för en djupmodell. Uppsatsen visar även att SBERT kan bli finjusterad för att koda artikel likhet med upp till 0.03 nDCG och upp till 0.05 nDCG@10 poängs förbättring.

  • Open Access English
    Authors: 
    Bereczki, Márk;
    Publisher: KTH, Skolan för elektroteknik och datavetenskap (EECS)
    Country: Sweden

    Recommender systems are widely used in websites and applications to help users find relevant content based on their interests. Graph neural networks achieved state- of-the- art results in the field of recommender systems, working on data represented in the form of a graph. However, most graph- based solutions hold challenges regarding computational complexity or the ability to generalize to new users. Therefore, we propose a novel graph- based recommender system, by modifying Simple Graph Convolution, an approach for efficient graph node classification, and add the capability of generalizing to new users. We build our proposed recommender system for recommending the articles of Peltarion Knowledge Center. By incorporating two data sources, implicit user feedback based on pageview data as well as the content of articles, we propose a hybrid recommender solution. Throughout our experiments, we compare our proposed solution with a matrix factorization approach as well as a popularity- based and a random baseline, analyse the hyperparameters of our model, and examine the capability of our solution to give recommendations to new users who were not part of the training data set. Our model results in slightly lower, but similar Mean Average Precision and Mean Reciprocal Rank scores to the matrix factorization approach, and outperforms the popularity- based and random baselines. The main advantages of our model are computational efficiency and its ability to give relevant recommendations to new users without the need for retraining the model, which are key features for real- world use cases. Rekommendationssystem används ofta på webbplatser och applikationer för att hjälpa användare att hitta relevant innehåll baserad på deras intressen. Med utvecklingen av grafneurala nätverk nådde toppmoderna resultat inom rekommendationssystem och representerade data i form av en graf. De flesta grafbaserade lösningar har dock svårt med beräkningskomplexitet eller att generalisera till nya användare. Därför föreslår vi ett nytt grafbaserat rekommendatorsystem genom att modifiera Simple Graph Convolution. De här tillvägagångssätt är en effektiv grafnodsklassificering och lägga till möjligheten att generalisera till nya användare. Vi bygger vårt föreslagna rekommendatorsystem för att rekommendera artiklarna från Peltarion Knowledge Center. Genom att integrera två datakällor, implicit användaråterkoppling baserad på sidvisningsdata samt innehållet i artiklar, föreslår vi en hybridrekommendatörslösning. Under våra experiment jämför vi vår föreslagna lösning med en matrisfaktoriseringsmetod samt en popularitetsbaserad och en slumpmässig baslinje, analyserar hyperparametrarna i vår modell och undersöker förmågan hos vår lösning att ge rekommendationer till nya användare som inte deltog av träningsdatamängden. Vår modell resulterar i något mindre men liknande Mean Average Precision och Mean Reciprocal Rank poäng till matrisfaktoriseringsmetoden och överträffar de popularitetsbaserade och slumpmässiga baslinjerna. De viktigaste fördelarna med vår modell är beräkningseffektivitet och dess förmåga att ge relevanta rekommendationer till nya användare utan behov av omskolning av modellen, vilket är nyckelfunktioner för verkliga användningsfall.

Send a message
How can we help?
We usually respond in a few hours.