Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ Norwegian Open Resea...arrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
UiS Brage
Master thesis . 2023
Data sources: UiS Brage
versions View all 2 versions
addClaim

Machine learning to detect corporate greenwashing

Authors: Lien, Audun Stjernelund;

Machine learning to detect corporate greenwashing

Abstract

This master thesis focuses on developing an automatic approach to detect corporate greenwashing. To achieve this, data must be collected, and green claims found from this data must be fact checked. The first step is to collect data by scraping. The web scrapers in this thesis were designed to extract comprehensive information about companies from their websites and reports using two datasets as benchmarks. The Fauna dataset was scraped using a recursive web scraper that extracted data from sub-pages linked to each company’s website. The CICERO Shades of Green dataset was scraped using a scraper that visited each link in the dataset to extract the text from each report made by CICERO. The collected datasets underwent preprocessing to ensure compatibility with machine learning models. The texts scraped from the Fauna dataset were often excessively long due to the abundance of information on the websites. These texts were summarized using a Transformer model, and irrelevant texts were manually removed from the dataset. In the case of the Cicero dataset, text augmentation was applied to expand the dataset and investigate its impact on model performance. To address the limited data availability, transfer-learning techniques including zero, one, and two-shot learning were applied to both the Fauna and Cicero datasets. These techniques leverage pre-trained models to learn from a small amount of labeled data. Additionally, fine-tuned models were implemented specifically for the Cicero dataset to provide a basis for comparison. The trained models achieved superior performance to the transfer-learning models, suggesting that training large models with limited training data remains an effective approach.

Country
Norway
Related Organizations
  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Green
Related to Research communities