Ever since the explosion of the internet, fake news has always been a cause for concern. The proliferation of fake news online hinders access to reliable information. The efficiency of several machine learning methods for the identification of fake news is investigated in this work. We train and evaluate five models: Support Vector Machine (SVM), Logistic Regression, Random Forest, Long Short-Term Memory (LSTM), and Naive Bayes. Employing two distinct datasets, we evaluate the models' generalizability. We extract textual features from the news articles and assess their performance using established metrics. This investigation sheds light on the advantages and limitations of each model within the context of fake news classification, contributing to the development of more robust detection systems. Furthermore, we explore the impact of utilizing different machine learning paradigms, including supervised learning (Logistic Regression, Random Forest, SVM) and deep learning (LSTM) on the detection accuracy. This comparative analysis provides valuable insights into the optimal approach for tackling the intricate challenge of fake news identification.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od______3532::5256bb39dbe6c7d8d350724defd5af2a&type=result"></script>');
-->
</script>
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od______3532::5256bb39dbe6c7d8d350724defd5af2a&type=result"></script>');
-->
</script>
The Abstractive News Captions with High-level cOntext Representation (ANCHOR) dataset contains 70K+ samples sourced from 5 different news media organizations. This dataset can be utilized for Vision & Language tasks such as Text-to-Image Generation, Image Caption Generation, etc.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.10974907&type=result"></script>');
-->
</script>
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.10974907&type=result"></script>');
-->
</script>
This is training and testing data for the detection of hunting pits in airborne laser data. The data is split into three parts. 1: Data for transfer learning with radar imagery and impact craters on the moon. 2. Data for training and testing of the machine learning model. 3: Data from a separate demonstration area used to evaluate the model. The lunar data (1) were used to pre-train a machine learning model before training on the real data of hunting pits from earth (2). The demonstration data was used to visually evaluate the result of the final model. All code used to create this dataset and train the machine learning models can be found here: https://github.com/williamlidberg/Detection-of-hunting-pits-using-airborne-laser-scanning-and-deep-learning The code is also included in the file "code.zip" Det här är tränings och test-data för att detektera fångstgropar i laserdata med hjälp av maskininlärning. Datat är uppdelat i tre delar. 1: Data för förträning med hjälp av radarbilder och kratrar på månen. 2: Data för träning och testning av maskininlärningsmodellen. 3: Data över ett demonstrationsområde där modellen testas. Datat från månen (1) användes för att förträna en maskininlärningsmodell och datat från jorden (2) användes för att träna modellen på att kartera fångstgropar på jorden. Demonstrationsområdet användes för att visuellt utvärdera resultatet. All kod som används för att ta fram datat samt träna modellerna finns här: https://github.com/williamlidberg/Detection-of-hunting-pits-using-airborne-laser-scanning-and-deep-learning koden finns också i filen "code.zip"
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5878/en98-1b29&type=result"></script>');
-->
</script>
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5878/en98-1b29&type=result"></script>');
-->
</script>
handle: 11382/565732
Overview The Corpus of Resolution: UN Security Council (CR-UNSC) collects and presents for the first time in human and machine-readable form all resolutions, drafts, and meeting records of the UN Security Council, including detailed metadata, as published by the UN Digital Library and revised by the authors. The United Nations Security Council (UNSC) is the most influential of the principal UN organs. Composed of five permanent and ten non-permanent members, its functioning is constrained by the political context in which it operates. During the Cold War, the complex political relationships between the permanent members and their veto powers significantly affected the capacity of the UNSC to address violations of international peace and security, with only 646 resolutions passed from 1946 to 1989. Since the 1990s, the activity of the UN Security Council has increased dramatically and produced 2721 resolutions up to the end of 2023. The length, complexity and thematic breadth of the resolutions has also increased, prompting calls to redefine it as a quasi-legislative body. Under Articles 24 and 25 of the UN Charter, member states have conferred upon the UNSC the "primary responsibility for the maintenance of international peace and security" and have agreed "to accept and carry out" its decisions. The discharge of this function is carried out through the powers bestowed upon it under Chapter VI of the UN Charter, "Pacific Settlement of Disputes", Chapter VII, "Action with Respect to Threats to the Peace, Breaches of the Peace, and Acts of Aggression", Chapter VIII, "Regional Arrangements", and Chapter XII, "International Trusteeship System". Under the peace and security mandate, its areas of activity cover disarmament, pacific settlement of disputes, enforcement, and, until 1994, strategic areas in a trusteeship agreement. Its functions also pertain to the correct working of the United Nations, covering issues of membership, the appointment of the Secretary General, the elections of judges of the International Court of Justice (ICJ), the calling of special and emergency sessions of the General Assembly, the amendment of the Charter and of the ICJ Statute. Please refer to the Codebook for a detailed explanation of the dataset and instructions on how to make use of it. Updates The CR-UNSC will be updated at least once per year. In case of serious errors an update will be provided at the earliest opportunity and a highlighted advisory issued on the Zenodo page of the current version. Minor errors will be documented in the GitHub issue tracker and fixed with the next scheduled release. The CR-UNSC is versioned according to the day of the last run of the data pipeline, in the ISO format YYYY-MM-DD. Its initial release version is 2024-05-03. Notifications regarding new and updated data sets will be published on my academic website at www.seanfobbe.com or on the Fediverse at @seanfobbe@fediscience.org Changelog New variant: EN_TXT_BEST containing a write-out of the English resolution texts equivalent to the CSV file text variable New diagrams: bar charts of top M49 regions and sub-regions of countries mentioned in resolution texts Fixed naming mix-up of BIBTEX and GRAPHML zip archives Fixed whitespace character detection in citation extraction (adds ca. 10% more citations) Fixed improper merging of weights in citation network Fixed "cannot xtfrm data frames" warning Improve REGEX detection for certain geographic entities Improve Codebook (headings, citation network docs) Key Metrics Version: 2024-05-19 Scope: UNSC Resolutions from 1 (1946) up to and including 2722 (2024) Tokens: 3,704,016 (English resolution texts) Languages: English, French, Spanish, Arabic, Chinese, Russian Features 82 Variables Resolution texts in all six official UN languages (English, French, Spanish, Arabic, Chinese, Russian) Draft texts of resolutions in English Meeting record texts in English URLs to draft texts in all other languages (French, Spanish, Arabic, Chinese, Russian) URLs to meeting record texts in all other languages (French, Spanish, Arabic, Chinese, Russian) Citation data as GraphML (UNSC-to-UNSC resolutions and UNSC-to-UNGA resolutions) Bibliographic database in BibTeX/OSCOLA format for e.g. Zotero, Endnote and Jabref Extensive Codebook to explain the uses of the dataset Compilation Report and Quality Assurance Report explain construction and validation of the data set Publication quality diagrams for teaching, research and all other purposes (PDF for printing, PNG for web) Open and platform independent file formats (CSV, PDF, TXT, GraphML) Software version controlled with Docker Publication of full data set (Open Data) Publication of full source code (Open Source) Data published under Public Domain waiver (CC Zero 1.0) Source Code is Free Software published under the GNU General Public License Version 3 (GNU GPL v3) Secure cryptographic signatures for all files in version of record (SHA2-256 and SHA3-512) Recommended Variants Traditional Scholars ALL_PDF_Resolutions EN_TXT_BEST BIBTEX_OSCOLA Quantitative Scholars ALL_CSV_FULL EN_TXT_BEST CITATIONS_GRAPHML Please refer to the Codebook regarding for details on each variant. The ZIP archives include texts in all languages, unless noted in the filename. We strongly recommend using the CSV files for quantitative analysis, but if you find CSV hard to use and want to analyze only the text of resolutions, the EN_TXT_BEST variant is a mix of expert-revised OCR and born digital texts equivalent to the "text" variable in the CSV file. Compilation Report and Quality Assurance Report With every compilation of the full data set, an extensive Compilation Report and detailed Quality Assurance Report are created and published in PDF format. The Compilation Report includes the source code for the pipeline architecture, comments and explanations of design decisions, relevant computational results, exact timestamps and a table of contents with clickable internal hyperlinks to each section. The Quality Assurance Report contains a count of all hard tests and expectations, additional visualizations and documented test results for all soft tests that require further interpretation The Compilation Report, Quality Assurance Report and Source Code are published under the following DOI: https://zenodo.org/doi/10.5281/zenodo.7319783 Attribution and Copyright This data is derived from the United Nations Digital Library at https://digitallibrary.un.org. Records were accessed and downloaded on 13 and 26 March 2024, with additional work on revisions and corrections up to and including the date given as the version number. Pursuant to UN Administrative Instruction ST/AI/189/Add.9/Rev.2 of 17 September 1987 all official records and United Nations Documents (including resolutions, compilations of resolutions, drafts and meeting records) are in the public domain. We wish to honor the letter and spirit of this UN policy. To ensure the widest possible distribution of official UN documents and to promote the international rule of law we waive any copyright that might have accrued by creating the dataset under a Creative Commons CC0 1.0 Universal (CC0 1.0) Public Domain Dedication. Disclaimer This data set is an academic initiative and is not associated with or endorsed by the United Nations or any of its constituent organs and organizations. Author Websites Personal Website of Seán Fobbe Personal Website of Lorenzo Gasbarri Personal Website of Niccolò Ridi Contact Did you discover any errors? Do you have suggestions on how to improve the data set? You can either post these to the Issue Tracker on GitHub or contact Seán Fobbe via https://seanfobbe.com/contact/
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.11212056&type=result"></script>');
-->
</script>
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.11212056&type=result"></script>');
-->
</script>
This paper delves into the transformative intersection of emerging technologies and digital libraries, illuminating a path toward an enriched and accessible knowledge landscape. Focusing on Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Augmented Reality (AR), and Virtual Reality (VR), the study explores how these technologies redefine digital library experiences. AI and ML algorithms empower intuitive content curation and recommendation, reshaping the way users interact with digital resources. NLP bridges the gap between human language intricacies and digital systems, enhancing search functionalities and making information retrieval seamless. AR overlays digital information onto the physical world, expanding interactive learning possibilities, while VR immerses users in virtual realms, revolutionizing educational paradigms. The paper critically examines the practical integration of these technologies, ensuring digital libraries not only preserve vast knowledge repositories but also present information in engaging and accessible formats. Through AI-driven metadata generation and content tagging, digital libraries are systematically organized and enriched, amplifying search accuracy. These innovations not only preserve the past but also illuminate a future where knowledge is universally accessible, fostering curiosity, learning, and exploration. The study not only theoretically explores the potential of these technologies but also delves into the perceptions of practical library users, ensuring a user-centric approach in shaping the digital libraries of tomorrow. This research contributes significantly to the evolving landscape of digital libraries, paving the way for inclusive, immersive, and engaging knowledge experiences for diverse users worldwide.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.10211088&type=result"></script>');
-->
</script>
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.10211088&type=result"></script>');
-->
</script>
This is the Repository of all the research data for PhD Thesis of the doctoral candidate Nan BAI from the Faculty Architecture and Built Environment at Delft University of Technology, with the title of '*Sensing the Cultural Significance with AI for Social Inclusion: A Computational Spatiotemporal Network-based Framework of Heritage Knowledge Documentation using User-Generated*', to be defended on October 5th, 2023.Social Inclusion has been growing as a goal in heritage management. Whereas the 2011 UNESCO Recommendation on the Historic Urban Landscape (HUL) called for tools of knowledge documentation, social media already functions as a platform for online communities to actively involve themselves in heritage-related discussions. Such discussions happen both in “baseline scenarios” when people calmly share their experiences about the cities they live in or travel to, and in “activated scenarios” when radical events trigger their emotions. To organize, process, and analyse the massive unstructured multi-modal (mainly images and texts) user-generated data from social media efficiently and systematically, Artificial Intelligence (AI) is shown to be indispensable. This thesis explores the use of AI in a methodological framework to include the contribution of a larger and more diverse group of participants with user-generated data. It is an interdisciplinary study integrating methods and knowledge from heritage studies, computer science, social sciences, network science, and spatial analysis. AI models were applied, nurtured, and tested, helping to analyse the massive information content to derive the knowledge of cultural significance perceived by online communities. The framework was tested in case study cities including Venice, Paris, Suzhou, Amsterdam, and Rome for the baseline and/or activated scenarios. The AI-based methodological framework proposed in this thesis is shown to be able to collect information in cities and map the knowledge of the communities about cultural significance, fulfilling the expectation and requirement of HUL, useful and informative for future socially inclusive heritage management processes.Some parts of this data are published as GitHub repositories:WHOSe HeritageThe data of Chapter_3_Lexicon is published as https://github.com/zzbn12345/WHOSe_Heritage, which is also the Code for the Paper WHOSe Heritage: Classification of UNESCO World Heritage Statements of “Outstanding Universal Value” Documents with Soft Labels published in Findings of EMNLP 2021 (https://aclanthology.org/2021.findings-emnlp.34/).Heri GraphsThe data of Chapter_4_Datasets is published as https://github.com/zzbn12345/Heri_Graphs, which is also the Code and Dataset for the Paper Heri-Graphs: A Dataset Creation Framework for Multi-modal Machine Learning on Graphs of Heritage Values and Attributes with Social Media published in ISPRS International Journal of Geo-Information showing the collection, preprocessing, and rearrangement of data related to Heritage values and attributes in three cities that have canal-related UNESCO World Heritage properties: Venice, Suzhou, and Amsterdam.Stones VeniceThe data of Chapter_5_Mapping is published as https://github.com/zzbn12345/Stones_Venice, which is also the Code and Dataset for the Paper Screening the stones of Venice: Mapping social perceptions of cultural significance through graph-based semi-supervised classification published in ISPRS Journal of Photogrammetry and Remote Sensing showing the mapping of cultural significance in the city of Venice.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.4121/42144de2-d61e-48b9-a288-aa4da3a806fe.v1&type=result"></script>');
-->
</script>
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.4121/42144de2-d61e-48b9-a288-aa4da3a806fe.v1&type=result"></script>');
-->
</script>
This dataset containts the underlying data that was used to analyse the SmartCulTour identified interventions that were described under the production of abstracts and practice videos of D6.2. The data collection established: Context & background Reason why of the intervention Resources and tools Expected economic impact Expected social impact Expected cultural impact Expected environmental impact Success conditions These criteria were analysed by the SmartCulTour local experts and meant to feed into the updated taxonomy on cultural tourism interventions of D3.4. The interventions that are described are linked to the practice videos that can be found under the Living Labs section of the SmartCulTour website (www.smartcultour.eu) and pertain to: Rotterdam Living Lab: Planning for the future of Hoek van Holland & Bospolder-Tussendijken Scheldeland Living Lab: Hof van Coolhem: social employment and care project in tourism Scheldeland Living Lab: Bornem Castle: upgrades historical exhibitions & creates visitor centre Scheldeland Living Lab: Steam train Dendermonde-Puurs: volunteers protecting industrial heritage Utsjoki Living Lab: Traces in Utsjoki: inspiring respectful visitor behaviour in nature areas Utsjoki Living Lab: Placemaking as a technique to support meaningful visitor experiences Huesca Living Lab: The Somontano Wine Route: a resilient strategy for Huesca Huesca Living Lab: The Río Vero Cultural Park. From Palaeolithic human history to the present Split Living Lab: Making traditional Easter bread-Sirnica in Solin Split Living Lab: The cultural heritage of Sinj: the story of Alka Vicenza Living Lab: Vicenza: the city of Palladio Vicenza Living Lab: The international library "La Vigna" becomes an open innovation Living Lab
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.8102778&type=result"></script>');
-->
</script>
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.8102778&type=result"></script>');
-->
</script>
Data and materials that accompany the paper "Contestable Camera Cars: A Speculative Design Exploration of Public AI That Is Open and Responsive to Dispute". Applying a provisional framework for contestable AI, we use speculative design to create a concept video of a contestable camera car. Using this concept video, we then conduct semi-structured interviews with 17 civil servants who work with AI employed by a large northwestern European city. The resulting data is analyzed using reflexive thematic analysis to identify the main challenges facing the implementation of contestability in public AI.This study was approved by TU Delft Human Research Ethics Committee.The data consists of:Concept video design brief (PDF)Concept video script (PDF)Concept video storyboards (PNG)Concept video (MP4)Expert assessment interview guide (PDF)Expert assessment grading form (PDF)Expert assessment completed grading forms (PDF)Expert assessment tabulated scores (XLSX)Expert assessment informal analysis report (PDF)Civil servant interviews guide (PDF)Civil servant interviews summaries (TXT)Civil servant interviews code book (PDF)
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.4121/21253317&type=result"></script>');
-->
</script>
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.4121/21253317&type=result"></script>');
-->
</script>
Harmony is a data harmonisation project that uses Natural Language Processing to help researchers make better use of existing data from different studies by supporting them with the harmonisation of various measures and items used in different studies. Harmony is a collaboration project between the University of Ulster, University College London, the Universidade Federal de Santa Maria in Brazil, and Fast Data Science Ltd. You can read more at https://harmonydata.org. There is a live demo at: https://app.harmonydata.org/ These are the datasets used to validate Harmony. The Excel file is McElroy et al's data, and the zip file contains the English and Portuguese GAD-7s.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.7770349&type=result"></script>');
-->
</script>
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.7770349&type=result"></script>');
-->
</script>
Annual reports Assessment Dataset This dataset will help investors, merchant bankers, credit rating agencies, and the community of equity research analysts explore annual reports in a more automated way, saving them time. Following Sub Dataset(s) are there : a) pdf and corresponding OCR text of 100 Indian annual reports These 100 annual reports are for the 100 largest companies listed on the Bombay Stock Exchange. The total number of words in OCRed text is 12.25 million. b) A Few Examples of Sentences with Corresponding Classes The author defined 16 widely used topics used in the investment community as classes like: Accounting Standards Accounting for Revenue Recognition Corporate Social Responsbility Credit Ratings Diversity Equity and Inclusion Electronic Voting Environment and Sustainability Hedging Strategy Intellectual Property Infringement Risk Litigation Risk Order Book Related Party Transaction Remuneration Research and Development Talent Management Whistle Blower Policy These classes should help generate ideas and investment decisions, as well as identify red flags and early warning signs of trouble when everything appears to be proceeding smoothly. ABOUT DATA :: "scrips.json" is a json with name of companies "SC_CODE" is BSE Scrip Id "SC_NAME" is Listed Companies Name "NET_TURNOV" is Turnover on the day of consideration "source_pdf" is folder containing both PDF and OCR Output from Tesseract "raw_pdf.zip" contains raw PDF and it can be used to try another OCR. "ocr.zip" contains json file (annual_report_content.json) containing OCR text for each pdf. "annual_report_content.json" is an array of 100 elements and each element is having two keys "file_name" and "content" "classif_data_rank_freezed.json" is used for evaluation of results contains "sentence" and corresponding "class" The author released this dataset for analyzing annual reports. The author also released a few class labels with examples that can be used by equity research analysts for analyzing annual reports. This dataset will help investors, merchant bankers, credit rating agencies, and the community of equity research analysts explore annual reports in a more automated way, saving them time. In the future, the author wants to expand the number of class labels and examples.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.7536331&type=result"></script>');
-->
</script>
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.7536331&type=result"></script>');
-->
</script>
Ever since the explosion of the internet, fake news has always been a cause for concern. The proliferation of fake news online hinders access to reliable information. The efficiency of several machine learning methods for the identification of fake news is investigated in this work. We train and evaluate five models: Support Vector Machine (SVM), Logistic Regression, Random Forest, Long Short-Term Memory (LSTM), and Naive Bayes. Employing two distinct datasets, we evaluate the models' generalizability. We extract textual features from the news articles and assess their performance using established metrics. This investigation sheds light on the advantages and limitations of each model within the context of fake news classification, contributing to the development of more robust detection systems. Furthermore, we explore the impact of utilizing different machine learning paradigms, including supervised learning (Logistic Regression, Random Forest, SVM) and deep learning (LSTM) on the detection accuracy. This comparative analysis provides valuable insights into the optimal approach for tackling the intricate challenge of fake news identification.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od______3532::5256bb39dbe6c7d8d350724defd5af2a&type=result"></script>');
-->
</script>
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=od______3532::5256bb39dbe6c7d8d350724defd5af2a&type=result"></script>');
-->
</script>
The Abstractive News Captions with High-level cOntext Representation (ANCHOR) dataset contains 70K+ samples sourced from 5 different news media organizations. This dataset can be utilized for Vision & Language tasks such as Text-to-Image Generation, Image Caption Generation, etc.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.10974907&type=result"></script>');
-->
</script>
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.10974907&type=result"></script>');
-->
</script>
This is training and testing data for the detection of hunting pits in airborne laser data. The data is split into three parts. 1: Data for transfer learning with radar imagery and impact craters on the moon. 2. Data for training and testing of the machine learning model. 3: Data from a separate demonstration area used to evaluate the model. The lunar data (1) were used to pre-train a machine learning model before training on the real data of hunting pits from earth (2). The demonstration data was used to visually evaluate the result of the final model. All code used to create this dataset and train the machine learning models can be found here: https://github.com/williamlidberg/Detection-of-hunting-pits-using-airborne-laser-scanning-and-deep-learning The code is also included in the file "code.zip" Det här är tränings och test-data för att detektera fångstgropar i laserdata med hjälp av maskininlärning. Datat är uppdelat i tre delar. 1: Data för förträning med hjälp av radarbilder och kratrar på månen. 2: Data för träning och testning av maskininlärningsmodellen. 3: Data över ett demonstrationsområde där modellen testas. Datat från månen (1) användes för att förträna en maskininlärningsmodell och datat från jorden (2) användes för att träna modellen på att kartera fångstgropar på jorden. Demonstrationsområdet användes för att visuellt utvärdera resultatet. All kod som används för att ta fram datat samt träna modellerna finns här: https://github.com/williamlidberg/Detection-of-hunting-pits-using-airborne-laser-scanning-and-deep-learning koden finns också i filen "code.zip"
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5878/en98-1b29&type=result"></script>');
-->
</script>
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5878/en98-1b29&type=result"></script>');
-->
</script>
handle: 11382/565732
Overview The Corpus of Resolution: UN Security Council (CR-UNSC) collects and presents for the first time in human and machine-readable form all resolutions, drafts, and meeting records of the UN Security Council, including detailed metadata, as published by the UN Digital Library and revised by the authors. The United Nations Security Council (UNSC) is the most influential of the principal UN organs. Composed of five permanent and ten non-permanent members, its functioning is constrained by the political context in which it operates. During the Cold War, the complex political relationships between the permanent members and their veto powers significantly affected the capacity of the UNSC to address violations of international peace and security, with only 646 resolutions passed from 1946 to 1989. Since the 1990s, the activity of the UN Security Council has increased dramatically and produced 2721 resolutions up to the end of 2023. The length, complexity and thematic breadth of the resolutions has also increased, prompting calls to redefine it as a quasi-legislative body. Under Articles 24 and 25 of the UN Charter, member states have conferred upon the UNSC the "primary responsibility for the maintenance of international peace and security" and have agreed "to accept and carry out" its decisions. The discharge of this function is carried out through the powers bestowed upon it under Chapter VI of the UN Charter, "Pacific Settlement of Disputes", Chapter VII, "Action with Respect to Threats to the Peace, Breaches of the Peace, and Acts of Aggression", Chapter VIII, "Regional Arrangements", and Chapter XII, "International Trusteeship System". Under the peace and security mandate, its areas of activity cover disarmament, pacific settlement of disputes, enforcement, and, until 1994, strategic areas in a trusteeship agreement. Its functions also pertain to the correct working of the United Nations, covering issues of membership, the appointment of the Secretary General, the elections of judges of the International Court of Justice (ICJ), the calling of special and emergency sessions of the General Assembly, the amendment of the Charter and of the ICJ Statute. Please refer to the Codebook for a detailed explanation of the dataset and instructions on how to make use of it. Updates The CR-UNSC will be updated at least once per year. In case of serious errors an update will be provided at the earliest opportunity and a highlighted advisory issued on the Zenodo page of the current version. Minor errors will be documented in the GitHub issue tracker and fixed with the next scheduled release. The CR-UNSC is versioned according to the day of the last run of the data pipeline, in the ISO format YYYY-MM-DD. Its initial release version is 2024-05-03. Notifications regarding new and updated data sets will be published on my academic website at www.seanfobbe.com or on the Fediverse at @seanfobbe@fediscience.org Changelog New variant: EN_TXT_BEST containing a write-out of the English resolution texts equivalent to the CSV file text variable New diagrams: bar charts of top M49 regions and sub-regions of countries mentioned in resolution texts Fixed naming mix-up of BIBTEX and GRAPHML zip archives Fixed whitespace character detection in citation extraction (adds ca. 10% more citations) Fixed improper merging of weights in citation network Fixed "cannot xtfrm data frames" warning Improve REGEX detection for certain geographic entities Improve Codebook (headings, citation network docs) Key Metrics Version: 2024-05-19 Scope: UNSC Resolutions from 1 (1946) up to and including 2722 (2024) Tokens: 3,704,016 (English resolution texts) Languages: English, French, Spanish, Arabic, Chinese, Russian Features 82 Variables Resolution texts in all six official UN languages (English, French, Spanish, Arabic, Chinese, Russian) Draft texts of resolutions in English Meeting record texts in English URLs to draft texts in all other languages (French, Spanish, Arabic, Chinese, Russian) URLs to meeting record texts in all other languages (French, Spanish, Arabic, Chinese, Russian) Citation data as GraphML (UNSC-to-UNSC resolutions and UNSC-to-UNGA resolutions) Bibliographic database in BibTeX/OSCOLA format for e.g. Zotero, Endnote and Jabref Extensive Codebook to explain the uses of the dataset Compilation Report and Quality Assurance Report explain construction and validation of the data set Publication quality diagrams for teaching, research and all other purposes (PDF for printing, PNG for web) Open and platform independent file formats (CSV, PDF, TXT, GraphML) Software version controlled with Docker Publication of full data set (Open Data) Publication of full source code (Open Source) Data published under Public Domain waiver (CC Zero 1.0) Source Code is Free Software published under the GNU General Public License Version 3 (GNU GPL v3) Secure cryptographic signatures for all files in version of record (SHA2-256 and SHA3-512) Recommended Variants Traditional Scholars ALL_PDF_Resolutions EN_TXT_BEST BIBTEX_OSCOLA Quantitative Scholars ALL_CSV_FULL EN_TXT_BEST CITATIONS_GRAPHML Please refer to the Codebook regarding for details on each variant. The ZIP archives include texts in all languages, unless noted in the filename. We strongly recommend using the CSV files for quantitative analysis, but if you find CSV hard to use and want to analyze only the text of resolutions, the EN_TXT_BEST variant is a mix of expert-revised OCR and born digital texts equivalent to the "text" variable in the CSV file. Compilation Report and Quality Assurance Report With every compilation of the full data set, an extensive Compilation Report and detailed Quality Assurance Report are created and published in PDF format. The Compilation Report includes the source code for the pipeline architecture, comments and explanations of design decisions, relevant computational results, exact timestamps and a table of contents with clickable internal hyperlinks to each section. The Quality Assurance Report contains a count of all hard tests and expectations, additional visualizations and documented test results for all soft tests that require further interpretation The Compilation Report, Quality Assurance Report and Source Code are published under the following DOI: https://zenodo.org/doi/10.5281/zenodo.7319783 Attribution and Copyright This data is derived from the United Nations Digital Library at https://digitallibrary.un.org. Records were accessed and downloaded on 13 and 26 March 2024, with additional work on revisions and corrections up to and including the date given as the version number. Pursuant to UN Administrative Instruction ST/AI/189/Add.9/Rev.2 of 17 September 1987 all official records and United Nations Documents (including resolutions, compilations of resolutions, drafts and meeting records) are in the public domain. We wish to honor the letter and spirit of this UN policy. To ensure the widest possible distribution of official UN documents and to promote the international rule of law we waive any copyright that might have accrued by creating the dataset under a Creative Commons CC0 1.0 Universal (CC0 1.0) Public Domain Dedication. Disclaimer This data set is an academic initiative and is not associated with or endorsed by the United Nations or any of its constituent organs and organizations. Author Websites Personal Website of Seán Fobbe Personal Website of Lorenzo Gasbarri Personal Website of Niccolò Ridi Contact Did you discover any errors? Do you have suggestions on how to improve the data set? You can either post these to the Issue Tracker on GitHub or contact Seán Fobbe via https://seanfobbe.com/contact/
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.11212056&type=result"></script>');
-->
</script>
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.11212056&type=result"></script>');
-->
</script>
This paper delves into the transformative intersection of emerging technologies and digital libraries, illuminating a path toward an enriched and accessible knowledge landscape. Focusing on Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Augmented Reality (AR), and Virtual Reality (VR), the study explores how these technologies redefine digital library experiences. AI and ML algorithms empower intuitive content curation and recommendation, reshaping the way users interact with digital resources. NLP bridges the gap between human language intricacies and digital systems, enhancing search functionalities and making information retrieval seamless. AR overlays digital information onto the physical world, expanding interactive learning possibilities, while VR immerses users in virtual realms, revolutionizing educational paradigms. The paper critically examines the practical integration of these technologies, ensuring digital libraries not only preserve vast knowledge repositories but also present information in engaging and accessible formats. Through AI-driven metadata generation and content tagging, digital libraries are systematically organized and enriched, amplifying search accuracy. These innovations not only preserve the past but also illuminate a future where knowledge is universally accessible, fostering curiosity, learning, and exploration. The study not only theoretically explores the potential of these technologies but also delves into the perceptions of practical library users, ensuring a user-centric approach in shaping the digital libraries of tomorrow. This research contributes significantly to the evolving landscape of digital libraries, paving the way for inclusive, immersive, and engaging knowledge experiences for diverse users worldwide.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.10211088&type=result"></script>');
-->
</script>
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5281/zenodo.10211088&type=result"></script>');
-->
</script>
This is the Repository of all the research data for PhD Thesis of the doctoral candidate Nan BAI from the Faculty Architecture and Built Environment at Delft University of Technology, with the title of '*Sensing the Cultural Significance with AI for Social Inclusion: A Computational Spatiotemporal Network-based Framework of Heritage Knowledge Documentation using User-Generated*', to be defended on October 5th, 2023.Social Inclusion has been growing as a goal in heritage management. Whereas the 2011 UNESCO Recommendation on the Historic Urban Landscape (HUL) called for tools of knowledge documentation, social media already functions as a platform for online communities to actively involve themselves in heritage-related discussions. Such discussions happen both in “baseline scenarios” when people calmly share their experiences about the cities they live in or travel to, and in “activated scenarios” when radical events trigger their emotions. To organize, process, and analyse the massive unstructured multi-modal (mainly images and texts) user-generated data from social media efficiently and systematically, Artificial Intelligence (AI) is shown to be indispensable. This thesis explores the use of AI in a methodological framework to include the contribution of a larger and more diverse group of participants with user-generated data. It is an interdisciplinary study integrating methods and knowledge from heritage studies, computer science, social sciences, network science, and spatial analysis. AI models were applied, nurtured, and tested, helping to analyse the massive information content to derive the knowledge of cultural significance perceived by online communities. The framework was tested in case study cities including Venice, Paris, Suzhou, Amsterdam, and Rome for the baseline and/or activated scenarios. The AI-based methodological framework proposed in this thesis is shown to be able to collect information in cities and map the knowledge of the communities about cultural significance, fulfilling the expectation and requirement of HUL, useful and informative for future socially inclusive heritage management processes.Some parts of this data are published as GitHub repositories:WHOSe HeritageThe data of Chapter_3_Lexicon is published as https://github.com/zzbn12345/WHOSe_Heritage, which is also the Code for the Paper WHOSe Heritage: Classification of UNESCO World Heritage Statements of “Outstanding Universal Value” Documents with Soft Labels published in Findings of EMNLP 2021 (https://aclanthology.org/2021.findings-emnlp.34/).Heri GraphsThe data of Chapter_4_Datasets is published as https://github.com/zzbn12345/Heri_Graphs, which is also the Code and Dataset for the Paper Heri-Graphs: A Dataset Creation Framework for Multi-modal Machine Learning on Graphs of Heritage Values and Attributes with Social Media published in ISPRS International Journal of Geo-Information showing the collection, preprocessing, and rearrangement of data related to Heritage values and attributes in three cities that have canal-related UNESCO World Heritage properties: Venice, Suzhou, and Amsterdam.Stones VeniceThe data of Chapter_5_Mapping is published as https://github.com/zzbn12345/Stones_Venice, which is also the Code and Dataset for the Paper Screening the stones of Venice: Mapping social perceptions of cultural significance through graph-based semi-supervised classification published in ISPRS Journal of Photogrammetry and Remote Sensing showing the mapping of cultural significance in the city of Venice.