Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ edoc-Server. Open-Ac...arrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
https://dx.doi.org/10.18452/21...
Master thesis . 2020
License: CC BY NC
Data sources: Datacite
versions View all 2 versions
addClaim

An Image Classification Tool of Wikimedia Commons

Authors: Huang, Sisi;

An Image Classification Tool of Wikimedia Commons

Abstract

Labelling massive datasets consisting of images from webpages manually is quite time-consuming and also exhausting. If there was a tool which can help us to classify those unlabeled images automatically, it would not overwhelm us nearly as much. In this thesis we aim to extract significant features from images and to automate the annotation of unlabeled images. Due to the variety of images, we focus our attention on solving the problem of chart image classification. Chart images are frequently presented in documents and used as a common tool for visualizing relationships within the data. Especially, they are able to distinguish themselves by their patterns or shapes. To deal with this problem we propose machine learning models that can extract the images' features automatically, and predict their labels. Convolutional neural networks are the popular models for solving such problem of image classification. Thus, it is our goal to bridge the relationship between chart images and neural networks. In this thesis we attempt two directions to implement convolutional neural networks: transfer learning and self-training models. On a set of testing data a model using transfer learning based on the VGG-16 pre-trained model, achieves a test accuracy of up to 0.65. Self-training models are LeNet-5, Alex blocks and VGG blocks, which are grounded by AlexNet and VGG. However, performances of self-training models are sightly worse than transfer learning, the highest prediction accuracy of the self-training models is only 0.47.

Es ist sehr zeitaufwendig und auch anstrengend, riesige Datensätze, die aus Bildern von Webseiten bestehen, manuell zu beschriften. Wenn es ein Tool gäbe, mit dem wir diese unbeschrifteten Bilder automatisch klassifizieren können, dann würde es uns enorm nutzen. Das Ziel dieser Arbeit ist es, deutliche Eigenschaften von Bildern zu extrahieren, und die Klassifizierung der Bilder auf diese Weise zu automatisieren. Wir konzentrieren uns auf die Problemlösung im Bereich statistischer Grafiken. Statistische Grafiken tauchen häufig in Dokumenten auf und werden als allgemeines Werkzeug zur Visualisierung von Beziehungen innerhalb der Daten verwendet. Sie unterscheiden sich inbesondere anhand ihrer Muster und Formen. Wir schlagen Machine-Learning-Modelle vor, die automatisch Merkmale aus Bildern extrahieren und die Gattung der statistischen Grafik voraussagen können. Convolutional Neural Networks sind populäre Modelle um Bilder zu klassifizieren. In dieser Arbeit untersuchen wir zwei Varianten, um Convolutional Neural Networks zu implementieren: transferiertes Lernen und Selbst-Schulungsmodelle. Das Genauigkeitsmaß des Modells auf der Grundlage von VGG-16 ist bereits 0.65. Im Gegenteil, die Leistungen von den selbst-Schulungsmodellen sind schlechter als Modelle mit transferiertem Lernen, das beste Genauigkeitsmaß liegt nur bei 0.47.

Country
Germany
Related Organizations
Keywords

Machine Learning, transferiertes Lernen, ddc:000, Image Labelling, 000 Informatik, Informationswissenschaft, allgemeine Werke, Convolutional Neural Network, Bildbeschriftung, Transfer learning

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    1
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 202
    download downloads 224
  • 202
    views
    224
    downloads
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
download
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
downloads
OpenAIRE UsageCountsDownloads provided by UsageCounts
1
Average
Average
Average
202
224
Green