Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Doctoral thesis . 2020
License: CC BY NC ND
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Thesis . 2020
License: CC BY NC ND
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
Lucerne Open Repository
Doctoral thesis . 2020
License: CC BY NC ND
ZENODO
Thesis . 2020
License: CC BY NC ND
Data sources: Datacite
versions View all 3 versions
addClaim

Deep Inverse Cooking

Authors: Bravin, Marc;

Deep Inverse Cooking

Abstract

Medical images are widely used in hospitals for the diagnosis and treatment of many diseases, such as skin cancer or diabetic retinopathy. Machine learning algorithms have recently been shown to outperform human doctors in a broad variety of diagnosis tasks. A diagnosis is often posed as a semantic segmentation problem where models are trained to classify each pixel of an image or as a multi-label classification task where the output is a set of tags. However, both types of outputs are hard to interpret due to the lack of reasoning about how the decisions were achieved. In contrast, a diagnosis made by a medical doctor is different. When a family doctor refers a patient to a specialist, he will expect a medical report in which the specialist explains her diagnosis. Likewise, the output of a neural network would be more useful if augmented by a medical report written in a natural language. Recently, there has been much progress in the development of image-to-text models that the task of automatically generating medical reports can now be considered feasible. However, such models require a large amount of paired data, i.e. images paired with medical reports. To the author's best knowledge, there is no publicly available dataset of such paired data. In order to experiment with image-to-text models, domains were switched from medicine to cooking, where such data is prolific. A dataset consisting of 0.9M recipes and 1.3M images was acquired through crawling five different cooking platforms. Since the majority of the recipes originate from community cooking websites, an extensive data cleaning pipeline had to be implemented. This allowed the number of unique ingredients to be reduced from 1M to 1.3k at the cost of dropping some recipes. Using this dataset, a multi-task neural network model was implemented, trained and evaluated. It generates a list of ingredients (cf. medical features), a title and cooking instructions (cf. medical report) based on an image of a dish. The model consists of a VGG-16 encoder to extract image features. Given these features, a transformer-based decoder generates a list of ingredients. Finally, an additional transformer decoder generates the recipe title as well as the cooking instructions by processing the image and ingredients features simultaneously. Evaluation on unseen test data showed that the model achieves an F1 score of 38.62% for the ingredients prediction, a BLEU1 score of 7.17% for generating the title and a BLEU4 score of 6.15% for the instructions text generation task. Comparing the architecture of the inverse cooking model to medical image captioning systems from the literature shows several similarities. Therefore, it is expected that the proposed model can be adapted and extended for generating medical reports in the future.

+ ID der Publikation: hslu_78709 + Art des Beitrages: Bericht + Sprache: Englisch + Letzte Aktualisierung: 2020-07-16 16:35:45

Country
Switzerland
Keywords

Machine Learning, Deep Learning, Computer Vision, Transformers, Convolutional Neural Networks, Image Captioning, Cooking, Supervised Learning, Natural Language Processing

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 472
    download downloads 55
  • 472
    views
    55
    downloads
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
download
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
downloads
OpenAIRE UsageCountsDownloads provided by UsageCounts
0
Average
Average
Average
472
55
Green