Descripción de escenas por medio de aprendizaje profundo

El presente documento contiene información relacionada al trabajo desarrollado para la reproducción de resultados en image captioning o descripción de imágenes, llevados a cabo por expertos en el área. El trabajo se realizó en la plataforma de software libre Python empleando la librería de TensorFlow, la cual finalizo con el entrenamiento de una red neuronal recurrente cuya función es recibir como dato de entrada imágenes y entregar como salida un archivo de imagen igual al de la entrada con una descripción escrita de lo que se ve en la imagen. Para el entrenamiento se requirieron de dos tipos de data set distintos, el primero que abarque un amplio rango de imágenes y un segundo que aportase la descripción de las mismas; para ello se tomaron los data sets disponibles en la plataforma COCO (Common Objects in Context), la cual es una amplia base de datos para detección, segmentación y descripción. Las bases de datos usadas fueron las proporcionadas para el “COCO captioning challenge” un concurso realizado en el año 2015. A estos datasets se les realiza un preprocesamiento, el cual para las imágenes es la debida extracción de características de las imágenes, este proceso es realizado por medio de una red neuronal convolucional (CNN), en este caso se empleó la red Inception, las características posteriormente se guardaron en un archivo que fue empleado para el entrenamiento de la RNN. El documento también muestra información acerca de los procesos llevados a cabo que permitieron emplear los códigos trabajados para realizar el entrenamiento de una red que describa en español, terminando con el entrenamiento de una red que describa ambientes propios mediante la creación de data set. Al finalizar se procedió a realizar pruebas a las diferentes redes, donde se encontraron resultados satisfactorios ya que se proporcionaban descripciones acordes a lo suministrado, así como también descripciones con fallos como lo son el género del individuo en la imagen o color de objetos

This document contains information related with the Project developed for the reproduction of results of image captioning or image description, this previous Works were realized by experts in the field The work was done on the free software platform Python using the TensorFlow library, which ended with the training of a recurrent neural network whose function is to receive images as input and output an image file equal to the input with a written description of what you see in the image. For the training, two different types of data sets were required, the first one covering a wide range of images and the second one providing the description of them; for this, the available data sets were taken on the COCO platform (Common Objects in Context), which is a broad database for detection, segmentation and description. The databases used were those provided for the "COCO captioning challenge" a contest held in 2015. These datasets are pre-processed, which for the images is the proper extraction of characteristics of the images, this process is performed by means of a convolutional neural network (CNN), in this case the Inception network was used, the characteristics were subsequently stored in a file that was used for the training of the RNN. The document also shows information about the process realized in the Project that allows the use of the codes to train a red wich one describe images in spanish; following that, the Project conclude with the training of a red that can describes selected enviroments, with the condition of creating new data sets. At the end, tests were carried out on the several networks, where satisfactory results were found as descriptions were provided according to what was supplied, and also in some of the test the description show fails, like the gender of the person or the color of objects that the picture contains

Proyecto de grado (Ingeniero Mecatrónico)-- Universidad Autónoma de Occidente, 2018

Ingeniero(a) Mecatrónico(a)

Pregrado

Country

Colombia

Related Organizations

Universidad Autónoma de Occidente
Colombia

Keywords

Redes neurales (Computadores), Descripción de imágenes, Ingeniería Mecatrónica

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green