System log files are filled with logged events, status codes, and other messages. By analyzing the log files, the systems current state can be determined, and find out if something during its execution went wrong. Log file analysis has been studied for some time now, where recent studies have shown state-of-the-art performance using machine learning techniques. In this thesis, document classification solutions were tested on log files in order to classify regular system runs versus abnormal system runs. To solve this task, supervised and unsupervised learning methods were combined. Doc2Vec was used to extract document features, and Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) based architectures on the classification task. With the use of the machine learning models and preprocessing techniques the tested models yielded an f1-score and accuracy above 95% when classifying log files.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______681::2c3ba77a9da0053d5f182ebbd0047c13&type=result"></script>');
-->
</script>
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______681::2c3ba77a9da0053d5f182ebbd0047c13&type=result"></script>');
-->
</script>
The history of California is in many ways a story about water, and the outsized effect that droughts, floods, and seasonal precipitation rates have had on the political and economic development of the state over the past 170 years. This thesis uses discourse analysis of historical and ongoing negotiations that have been presented in federal and state reports, narratives, case laws and legislation to explore how the discourse around water politics has been shaped in the state. From this, an antiessentialist environmental history develops around the relationship between overdrafted groundwater basins in the Central Valley and the agriculture industry located there. Finally, this thesis explores what the future of a waterscape built during the capitalization of modern society may look like as we move towards a new regime of nature.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______681::9250b1029645feb5862d42d10f21bc14&type=result"></script>');
-->
</script>
Green | |
bronze |
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______681::9250b1029645feb5862d42d10f21bc14&type=result"></script>');
-->
</script>
Macroeconomic forecasting is a classic problem, today most often modeled using time series analysis. Few attempts have been made using machine learning methods, and even fewer incorporating unconventional data, such as that from social media. In this thesis, a Generative Adversarial Network (GAN) is used to predict U.S. unemployment, beating the ARIMA benchmark on all horizons. Furthermore, attempts at using Twitter data and the Natural Language Processing (NLP) model DistilBERT are performed. While these attempts do not beat the benchmark, they do show promising results with predictive power. The models are also tested at predicting the U.S. stock index S&P 500. For these models, the Twitter data does improve the accuracy and shows the potential of social media data when predicting a more erratic index with less seasonality that is more responsive to current trends in public discourse. The results also show that Twitter data can be used to predict trends in both unemployment and the S&P 500 index. This sets the stage for further research into NLP-GAN models for macroeconomic predictions using social media data. Makroekonomiska prognoser är sedan länge en svår utmaning. Idag löses de oftast med tidsserieanalys och få försök har gjorts med maskininlärning. I denna uppsats används ett generativt motstridande nätverk (GAN) för att förutspå amerikansk arbetslöshet, med resultat som slår samtliga riktmärken satta av en ARIMA. Ett försök görs också till att använda data från Twitter och den datorlingvistiska (NLP) modellen DistilBERT. Dessa modeller slår inte riktmärkena men visar lovande resultat. Modellerna testas vidare på det amerikanska börsindexet S&P 500. För dessa modeller förbättrade Twitterdata resultaten vilket visar på den potential data från sociala medier har när de appliceras på mer oregelbunda index, utan tydligt säsongsberoende och som är mer känsliga för trender i det offentliga samtalet. Resultaten visar på att Twitterdata kan användas för att hitta trender i både amerikansk arbetslöshet och S&P 500 indexet. Detta lägger grunden för fortsatt forskning inom NLP-GAN modeller för makroekonomiska prognoser baserade på data från sociala medier.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______681::82a954971dcf102f5e320c5296806d1b&type=result"></script>');
-->
</script>
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______681::82a954971dcf102f5e320c5296806d1b&type=result"></script>');
-->
</script>
handle: 11012/196459
Phoebe Apperson Hearst měla velmi úspěšného vlastního syna Williama, který byl ale více jako jeho otec: tvrdý obchodník. Našla však jemnou, uměleckou duši v malíři Orrinu Peckovi (1860–1921), který byl údajně gay a který ji, ještě za života své vlastní matky, začal oslovovat „má druhá mámo.“ Na základě podrobného výzkumu jejich vzájemné korespondence v Peckově pozůstalosti se můžeme ptát, jak moc si byla progresivní, bohatá žena 19. století, jakou byla Phoebe Hearst, vědoma Peckovy sexuality a pokud ano, jestli s tím neměla problém, nebo šlo o nevyřčené tajemství mezi nimi? Jejich příběh představí historik umění Ladislav Zikmund-Lender. Phoebe Apperson Hearst had a very successful son of William, but he was more like his father: a tough businessman. However, she found a delicate, artistic soul in the painter Orrin Peck (1860–1921), who was allegedly gay and who, while still his own mother's life, began to address her as “my second mother.” Based on a detailed study of their correspondence in Peck's estate, we may ask how much a progressive, rich 19th-century woman like Phoebe Hearst was aware of Peck's sexuality, and if so, if she had no problem with it, or was it an unspoken secret between them? Their story will be presented by art historian Ladislav Zikmund-Lender.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=11012/196459&type=result"></script>');
-->
</script>
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=11012/196459&type=result"></script>');
-->
</script>
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______681::2d397ff8773ee1eb01b5cfbea0b99cfc&type=result"></script>');
-->
</script>
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______681::2d397ff8773ee1eb01b5cfbea0b99cfc&type=result"></script>');
-->
</script>
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______361::8a3157dd5f95e0779f6e9a42242afb5b&type=result"></script>');
-->
</script>
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______361::8a3157dd5f95e0779f6e9a42242afb5b&type=result"></script>');
-->
</script>
Prediction and understanding of the regulatory effects of non-coding DNA is an extensive research area in genomics. Convolutional neural networks have been used with success in the past to predict regulatory features, making chromatin feature predictions based solely on non-coding DNA sequences. Non-coding DNA shares various similarities with the human spoken language. This makes Language models such as the transformer attractive candidates for deciphering the non-coding DNA language. This thesis investigates how well the transformer model, usually used for NLP problems, predicts chromatin features based on genome sequences compared to convolutional neural networks. More specifically, the CNN DeepSEA, which is used for regulatory feature prediction based on noncoding DNA, is compared with the transformer DNABert. Further, this study explores the impact different parameters and training strategies have on performance. Furthermore, other models (DeeperDeepSEA and DanQ) are also compared on the same tasks to give a broader comparison value. Lastly, the same experiments are conducted on modified versions of the dataset where the labels cover different amounts of the DNA sequence. This could prove beneficial to the transformer model, which can understand and capture longrange dependencies in natural language problems. The replication of DeepSEA was successful and gave similar results to the original model. Experiments used for DeepSEA were also conducted on DNABert, DeeperDeepSEA, and DanQ. All the models were trained on different datasets, and their results were compared. Lastly, a Prediction voting mechanism was implemented, which gave better results than the models individually. The results showed that DeepSEA performed slightly better than DNABert, regarding AUC ROC. The Wilcoxon Signed-Rank Test showed that, even if the two models got similar AUC ROC scores, there is statistical significance between the distribution of predictions. This means that the models look at the dataset differently and might be why combining their prediction presents good results. Due to time restrictions of training the computationally heavy DNABert, the best hyper-parameters and training strategies for the model were not found, only improved. The Datasets used in this thesis were gravely unbalanced and is something that needs to be worked on in future projects. This project works as a good continuation for the paper Whole-genome deep-learning analysis identifies contribution of non-coding mutations to autism risk, Which uses the DeepSEA model to learn more about how specific mutations correlate with Autism Spectrum Disorder. Arbetet kring hur icke-kodande DNA påverkar genreglering är ett betydande forskningsområde inom genomik. Convolutional neural networks (CNN) har tidigare framgångsrikt använts för att förutsäga reglerings-element baserade endast på icke-kodande DNA-sekvenser. Icke-kod DNA har ett flertal likheter med det mänskliga språket. Detta gör språkmodeller, som Transformers, till attraktiva kandidater för att dechiffrera det icke-kodande DNA-språket. Denna avhandling undersöker hur väl transformermodellen kan förutspå kromatin-funktioner baserat på gensekvenser jämfört med CNN. Mer specifikt jämförs CNN-modellen DeepSEA, som används för att förutsäga reglerande funktioner baserat på icke-kodande DNA, med transformern DNABert. Vidare undersöker denna studie vilken inverkan olika parametrar och träningsstrategier har på prestanda. Dessutom jämförs andra modeller (DeeperDeepSEA och DanQ) med samma experiment för att ge ett bredare jämförelsevärde. Slutligen utförs samma experiment på modifierade versioner av datamängden där etiketterna täcker olika mängder av DNA-sekvensen. Detta kan visa sig vara fördelaktigt för transformer modellen, som kan förstå beroenden med lång räckvidd i naturliga språkproblem. Replikeringen av DeepSEA experimenten var lyckad och gav liknande resultat som i den ursprungliga modellen. Experiment som användes för DeepSEA utfördes också på DNABert, DeeperDeepSEA och DanQ. Alla modeller tränades på olika datamängder, och resultat på samma datamängd jämfördes. Slutligen implementerades en algoritm som kombinerade utdatan av DeepDEA och DNABERT, vilket gav bättre resultat än modellerna individuellt. Resultaten visade att DeepSEA presterade något bättre än DNABert, med avseende på AUC ROC. Wilcoxon Signed-Rank Test visade att, även om de två modellerna fick liknande AUC ROC-poäng, så finns det en statistisk signifikans mellan fördelningen av deras förutsägelser. Det innebär att modellerna hanterar samma information på olika sätt och kan vara anledningen till att kombinationen av deras förutsägelser ger bra resultat. På grund av tidsbegränsningar för träning av det beräkningsmässigt tunga DNABert hittades inte de bästa hyper-parametrarna och träningsstrategierna för modellen, utan förbättrades bara. De datamängder som användes i denna avhandling var väldigt obalanserade, vilket måste hanteras i framtida projekt. Detta projekt fungerar som en bra fortsättning för projektet Whole-genome deep-learning analysis identifies contribution of non-coding mutations to autism risk, som använder DeepSEA-modellen för att lära sig mer om hur specifika DNA-mutationer korrelerar med autismspektrumstörning.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______681::6f380e822b7dde0b9283e89e3634d6e9&type=result"></script>');
-->
</script>
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______681::6f380e822b7dde0b9283e89e3634d6e9&type=result"></script>');
-->
</script>
This thesis evaluates the performance of different machine learning approaches to log classification based on a dataset derived from simulating intrusive behavior towards an enterprise web application. The first experiment consists of performing attacks towards the web app in correlation with the logs to create a labeled dataset. The second experiment consists of one unsupervised model based on a variational autoencoder and four super- vised models based on both conventional feature-engineering techniques with deep neural networks and embedding-based feature techniques followed by long-short-term memory architectures and convolutional neural networks. With this dataset, the embedding-based approaches performed much better than the conventional one. The autoencoder did not perform well compared to the supervised models. To conclude, embedding-based ap- proaches show promise even on datasets with different characteristics compared to natural language.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______681::e6c966ee8d251e8969ab4ddcfd7f759e&type=result"></script>');
-->
</script>
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______681::e6c966ee8d251e8969ab4ddcfd7f759e&type=result"></script>');
-->
</script>
Foreign policy is a constant balancing between playing an active role on the international stage versus taking a more cautious stance. This study compares foreign policy development of three conservative parties during the Cold war, in three different nations with vastly different geopolitical importance. The parties compared in the study are the American Republican party, the British Conservative party and the Swedish Conservative party. This study uses party platform and election manifestos from three separate election years as the main sources for comparison. The comparison covers the early 1950’s, middle 1960’s and finally the early 1980’s. The party programs are thereafter analyzed with help of Lane Crothers model to further classify the foreign policy development of each party during the given time period. In conclusion, the study indicates a general move toward an increasingly active involvement on the international stage. The Republican party represents a foreign policy predominantlybased on military and diplomatic capacity to safeguard American foreign interest in relation to the Soviet Union. The British Conservative party as well as the Swedish Conservative party represent a foreign policy strategy with higher emphasis on international economic cooperation based on the liberal market economy.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______681::195ac39d9475e771da624c414e70bc21&type=result"></script>');
-->
</script>
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______681::195ac39d9475e771da624c414e70bc21&type=result"></script>');
-->
</script>
The South African democratic transition in the 1990s represents one of the clearest cases of practical implementation of constitutional engineering. The process was aimed to the creation of the principle of national unity in the fundamental text first, hoping it would be mirrored consequently by a popular sentiment. Within this context, the Bill of Rights, included in the second chapter of the final text, affirmed itself as the most relevant document that emerged from the country's nation-building process. This thesis aims to compare the influences that the national and international components of the South African transitional legal culture had on the drafting of the Bill of Rights, through the investigation of their historical and political dynamics. The analysis highlights that the liberal component characterizes the majority of the text, while being, however, declined on the neo-liberal international doctrine, while the African customary law is recognized within the cultural rights but remains subjected to the requirement of conformity with the liberal provisions.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______681::d10d2272775e413358818af97445d131&type=result"></script>');
-->
</script>
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______681::d10d2272775e413358818af97445d131&type=result"></script>');
-->
</script>
System log files are filled with logged events, status codes, and other messages. By analyzing the log files, the systems current state can be determined, and find out if something during its execution went wrong. Log file analysis has been studied for some time now, where recent studies have shown state-of-the-art performance using machine learning techniques. In this thesis, document classification solutions were tested on log files in order to classify regular system runs versus abnormal system runs. To solve this task, supervised and unsupervised learning methods were combined. Doc2Vec was used to extract document features, and Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) based architectures on the classification task. With the use of the machine learning models and preprocessing techniques the tested models yielded an f1-score and accuracy above 95% when classifying log files.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______681::2c3ba77a9da0053d5f182ebbd0047c13&type=result"></script>');
-->
</script>
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______681::2c3ba77a9da0053d5f182ebbd0047c13&type=result"></script>');
-->
</script>
The history of California is in many ways a story about water, and the outsized effect that droughts, floods, and seasonal precipitation rates have had on the political and economic development of the state over the past 170 years. This thesis uses discourse analysis of historical and ongoing negotiations that have been presented in federal and state reports, narratives, case laws and legislation to explore how the discourse around water politics has been shaped in the state. From this, an antiessentialist environmental history develops around the relationship between overdrafted groundwater basins in the Central Valley and the agriculture industry located there. Finally, this thesis explores what the future of a waterscape built during the capitalization of modern society may look like as we move towards a new regime of nature.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______681::9250b1029645feb5862d42d10f21bc14&type=result"></script>');
-->
</script>
Green | |
bronze |
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______681::9250b1029645feb5862d42d10f21bc14&type=result"></script>');
-->
</script>
Macroeconomic forecasting is a classic problem, today most often modeled using time series analysis. Few attempts have been made using machine learning methods, and even fewer incorporating unconventional data, such as that from social media. In this thesis, a Generative Adversarial Network (GAN) is used to predict U.S. unemployment, beating the ARIMA benchmark on all horizons. Furthermore, attempts at using Twitter data and the Natural Language Processing (NLP) model DistilBERT are performed. While these attempts do not beat the benchmark, they do show promising results with predictive power. The models are also tested at predicting the U.S. stock index S&P 500. For these models, the Twitter data does improve the accuracy and shows the potential of social media data when predicting a more erratic index with less seasonality that is more responsive to current trends in public discourse. The results also show that Twitter data can be used to predict trends in both unemployment and the S&P 500 index. This sets the stage for further research into NLP-GAN models for macroeconomic predictions using social media data. Makroekonomiska prognoser är sedan länge en svår utmaning. Idag löses de oftast med tidsserieanalys och få försök har gjorts med maskininlärning. I denna uppsats används ett generativt motstridande nätverk (GAN) för att förutspå amerikansk arbetslöshet, med resultat som slår samtliga riktmärken satta av en ARIMA. Ett försök görs också till att använda data från Twitter och den datorlingvistiska (NLP) modellen DistilBERT. Dessa modeller slår inte riktmärkena men visar lovande resultat. Modellerna testas vidare på det amerikanska börsindexet S&P 500. För dessa modeller förbättrade Twitterdata resultaten vilket visar på den potential data från sociala medier har när de appliceras på mer oregelbunda index, utan tydligt säsongsberoende och som är mer känsliga för trender i det offentliga samtalet. Resultaten visar på att Twitterdata kan användas för att hitta trender i både amerikansk arbetslöshet och S&P 500 indexet. Detta lägger grunden för fortsatt forskning inom NLP-GAN modeller för makroekonomiska prognoser baserade på data från sociala medier.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______681::82a954971dcf102f5e320c5296806d1b&type=result"></script>');
-->
</script>
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______681::82a954971dcf102f5e320c5296806d1b&type=result"></script>');
-->
</script>
handle: 11012/196459
Phoebe Apperson Hearst měla velmi úspěšného vlastního syna Williama, který byl ale více jako jeho otec: tvrdý obchodník. Našla však jemnou, uměleckou duši v malíři Orrinu Peckovi (1860–1921), který byl údajně gay a který ji, ještě za života své vlastní matky, začal oslovovat „má druhá mámo.“ Na základě podrobného výzkumu jejich vzájemné korespondence v Peckově pozůstalosti se můžeme ptát, jak moc si byla progresivní, bohatá žena 19. století, jakou byla Phoebe Hearst, vědoma Peckovy sexuality a pokud ano, jestli s tím neměla problém, nebo šlo o nevyřčené tajemství mezi nimi? Jejich příběh představí historik umění Ladislav Zikmund-Lender. Phoebe Apperson Hearst had a very successful son of William, but he was more like his father: a tough businessman. However, she found a delicate, artistic soul in the painter Orrin Peck (1860–1921), who was allegedly gay and who, while still his own mother's life, began to address her as “my second mother.” Based on a detailed study of their correspondence in Peck's estate, we may ask how much a progressive, rich 19th-century woman like Phoebe Hearst was aware of Peck's sexuality, and if so, if she had no problem with it, or was it an unspoken secret between them? Their story will be presented by art historian Ladislav Zikmund-Lender.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=11012/196459&type=result"></script>');
-->
</script>
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=11012/196459&type=result"></script>');
-->
</script>
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______681::2d397ff8773ee1eb01b5cfbea0b99cfc&type=result"></script>');
-->
</script>
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______681::2d397ff8773ee1eb01b5cfbea0b99cfc&type=result"></script>');
-->
</script>
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______361::8a3157dd5f95e0779f6e9a42242afb5b&type=result"></script>');
-->
</script>
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od_______361::8a3157dd5f95e0779f6e9a42242afb5b&type=result"></script>');
-->
</script>
Prediction and understanding of the regulatory effects of non-coding DNA is an extensive research area in genomics. Convolutional neural networks have been used with success in the past to predict regulatory features, making chromatin feature predictions based solely on non-coding DNA sequences. Non-coding DNA shares various similarities with the human spoken language. This makes Language models such as the transformer attractive candidates for deciphering the non-coding DNA language. This thesis investigates how well the transformer model, usually used for NLP problems, predicts chromatin features based on genome sequences compared to convolutional neural networks. More specifically, the CNN DeepSEA, which is used for regulatory feature prediction based on noncoding DNA, is compared with the transformer DNABert. Further, this study explores the impact different parameters and training strategies have on performance. Furthermore, other models (DeeperDeepSEA and DanQ) are also compared on the same tasks to give a broader comparison value. Lastly, the same experiments are conducted on modified versions of the dataset where the labels cover different amounts of the DNA sequence. This could prove beneficial to the transformer model, which can understand and capture longrange dependencies in natural language problems. The replication of DeepSEA was successful and gave similar results to the original model. Experiments used for DeepSEA were also conducted on DNABert, DeeperDeepSEA, and DanQ. All the models were trained on different datasets, and their results were compared. Lastly, a Prediction voting mechanism was implemented, which gave better results than the models individually. The results showed that DeepSEA performed slightly better than DNABert, regarding AUC ROC. The Wilcoxon Signed-Rank Test showed that, even if the two models got similar AUC ROC scores, there is statistical significance between the distribution of predictions. This means that the models look at the dataset differently and might be why combining their prediction presents good results. Due to time restrictions of training the computationally heavy DNABert, the best hyper-parameters and training strategies for the model were not found, only improved. The Datasets used in this thesis were gravely unbalanced and is something that needs to be worked on in future projects. This project works as a good continuation for the paper Whole-genome deep-learning analysis identifies contribution of non-coding mutations to autism risk, Which uses the DeepSEA model to learn more about how specific mutations correlate with Autism Spectrum Disorder. Arbetet kring hur icke-kodande DNA påverkar genreglering är ett betydande forskningsområde inom genomik. Convolutional neural networks (CNN) har tidigare framgångsrikt använts för att förutsäga reglerings-element baserade endast på icke-kodande DNA-sekvenser. Icke-kod DNA har ett flertal likheter med det mänskliga språket. Detta gör språkmodeller, som Transformers, till attraktiva kandidater för att dechiffrera det icke-kodande DNA-språket. Denna avhandling undersöker hur väl transformermodellen kan förutspå kromatin-funktioner baserat på gensekvenser jämfört med CNN. Mer specifikt jämförs CNN-modellen DeepSEA, som används för att förutsäga reglerande funktioner baserat på icke-kodande DNA, med transformern DNABert. Vidare undersöker denna studie vilken inverkan olika parametrar och träningsstrategier har på prestanda. Dessutom jämförs andra modeller (DeeperDeepSEA och DanQ) med samma experiment för att ge ett bredare jämförelsevärde. Slutligen utförs samma experiment på modifierade versioner av datamängden där etiketterna täcker olika mängder av DNA-sekvensen. Detta kan visa sig vara fördelaktigt för transformer modellen, som kan förstå beroenden med lång räckvidd i naturliga språkproblem. Replikeringen av DeepSEA experimenten var lyckad och gav liknande resultat som i den ursprungliga modellen. Experiment som användes för DeepSEA utfördes också på DNABert, DeeperDeepSEA och DanQ. Alla modeller tränades på olika datamängder, och resultat på samma datamängd jämfördes. Slutligen implementerades en algoritm som kombinerade utdatan av DeepDEA och DNABERT, vilket gav bättre resultat än modellerna individuellt. Resultaten visade att DeepSEA presterade något bättre än DNABert, med avseende på AUC ROC. Wilcoxon Signed-Rank Test visade att, även om de två modellerna fick liknande AUC ROC-poäng, så finns det en statistisk signifikans mellan fördelningen av deras förutsägelser. Det innebär att modellerna hanterar samma information på olika sätt och kan vara anledningen till att kombinationen av deras förutsägelser ger bra resultat. På grund av tidsbegränsningar för träning av det beräkningsmässigt tunga DNABert hittades inte de bästa hyper-parametrarna och träningsstrategierna för modellen, utan förbättrades bara. De datamängder som användes i denna avhandling var väldigt obalanserade, vilket måste hanteras i framtida projekt. Detta projekt fungerar som en bra fortsättning för projektet Whole-genome deep-learning analysis identifies contribution of non-coding mutations to autism risk, som använder DeepSEA-modellen för att lära sig mer om hur specifika DNA-mutationer korrelerar med autismspektrumstörning.