Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ IEEE Accessarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
IEEE Access
Article . 2021 . Peer-reviewed
License: CC BY
Data sources: Crossref
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
IEEE Access
Article
License: CC BY
Data sources: UnpayWall
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
IEEE Access
Article . 2021
Data sources: DOAJ
https://dx.doi.org/10.60692/86...
Other literature type . 2021
Data sources: Datacite
https://dx.doi.org/10.60692/pn...
Other literature type . 2021
Data sources: Datacite
versions View all 4 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

Deep Neural Networks Using Residual Fast-Slow Refined Highway and Global Atomic Spatial Attention for Action Recognition and Detection

الشبكات العصبية العميقة باستخدام الطريق السريع المكرر السريع والبطيء المتبقي والاهتمام المكاني الذري العالمي للتعرف على العمل والكشف عنه
Authors: Manh-Hung Ha; Oscal Tzyh-Chiang Chen;

Deep Neural Networks Using Residual Fast-Slow Refined Highway and Global Atomic Spatial Attention for Action Recognition and Detection

Abstract

Dans ce travail, nous proposons deux réseaux de neurones profonds, DNN-1 et DNN-2, basés sur Fast-Slow Refined Highway (FSRH) et Global Atomic Spatial Attention (GASA) pour reconnaître et détecter efficacement les actions. Le DNN-1 proposé comprend un réseau neuronal convolutionnel 3D (3DCNN), une FSRH résiduelle (R_FSRH), une couche de réduction et une couche de classification pour la reconnaissance des actions. Dans la détection d'action de l'extraction et de la classification de la région-matière, le DNN-2 proposé se compose d'un 3DCNN, d'un réseau de proposition de région, de R_FSRH, de GASA et d'une couche de classification-localisation. Le 3DCNN prend la couche avant à la couche « Mixed-3c » du réseau 3D gonflé pré-entraîné (I3D) comme structure de base. La FSRH est composée de deux unités Refined Highway (RH) pour extraire une paire de caractéristiques d'actions rapides et lentes, où la RH a une attention temporelle à partir d'une convolution 3D non locale et d'une transformation affine par une amorce bilinéaire temporelle. Dans R_FSRH, plusieurs FSRH en cascade avec différentes connexions résiduelles ont été étudiées pour déterminer une connexion efficace. GASA calcule et concatène séquentiellement les caractéristiques de corrélation d'un sujet atomique et d'autres sujets pour découvrir efficacement des informations sémantiques de haut niveau. Dans les études d'ablation, des expériences approfondies ont été menées pour démontrer la performance supérieure des DNN-1 et DNN-2 proposés sur cinq ensembles de données vidéo difficiles de JHMDB-21, UCF101-24, police de la circulation, Charades et AVA. Notamment, le DNN-1 proposé présente des performances de pointe sur UCF101-24 et TP, et le DNN-2 présente des performances de pointe sur AVA, et le deuxième meilleur sur Charades, à notre connaissance. Par conséquent, les DNN-1 et DNN-2 proposés ici peuvent être des moteurs conscients du contexte exceptionnels pour diverses applications de compréhension vidéo.

En este trabajo, proponemos dos Redes Neuronales Profundas, DNN-1 y DNN-2, basadas en Fast-Slow Refined Highway (FSRH) y Global Atomic Spatial Attention (gasa) para reconocer y detectar acciones de manera efectiva. La DNN-1 propuesta incluye una red neuronal convolucional 3D (3DCNN), FSRH residual (R_FSRH), capa de reducción y capa de clasificación para el reconocimiento de acciones. En la detección de acciones de extracción y clasificación sujeto-región, el DNN-2 propuesto consiste en una 3DCNN, una red de propuestas de región, R_FSRH, gasa y una capa de clasificación-localización. El 3DCNN lleva la capa frontal a la capa "Mixed-3c" de la red Inflated 3D (I3D) preentrenada como estructura principal. FSRH se compone de dos unidades de autopista refinada (RH) para extraer un par de características de acciones rápidas y lentas, donde RH tiene atención temporal de una convolución 3D no local y una transformación afín por un inicio bilineal temporal. En R_FSRH, se investigaron múltiples FSRH en cascada con diferentes conexiones residuales para determinar una efectiva. Gasa calcula y concatena secuencialmente las características de correlación de un sujeto atómico y otros sujetos para descubrir eficazmente información semántica de alto nivel. En los estudios de ablación, se realizaron experimentos exhaustivos para demostrar el rendimiento superior de los DNN-1 y DNN-2 propuestos en cinco conjuntos de datos de video desafiantes de JHMDB-21, UCF101-24, policía de tránsito, Charades y AVA. En particular, el DNN-1 propuesto muestra un rendimiento de vanguardia en UCF101-24 y TP, y el DNN-2 muestra un rendimiento de vanguardia en AVA, y el segundo mejor en Charades, según nuestro conocimiento. Por lo tanto, los DNN-1 y DNN-2 propuestos en el presente documento pueden ser motores destacados conscientes del contexto para diversas aplicaciones de comprensión de video.

In this work, we propose two Deep Neural Networks, DNN-1 and DNN-2, based on Fast-Slow Refined Highway (FSRH) and Global Atomic Spatial Attention (GASA) to effectively recognize and detect actions. The proposed DNN-1 includes a 3D Convolutional Neural Network (3DCNN), Residual FSRH (R_FSRH), reduction layer, and classification layer for action recognition. In action detection of subject-region extraction and classification, the proposed DNN-2 consists of a 3DCNN, region proposal network, R_FSRH, GASA, and classification-localization layer. The 3DCNN takes the front layer to the "Mixed-3c" layer of the pre-trained Inflated 3D (I3D) network as the backbone structure. FSRH is composed of two Refined Highway (RH) units to extract a pair of features from fast and slow actions, where RH has temporal attention from a non-local 3D convolution and an affine transform by a temporal bilinear inception. In R_FSRH, multiple cascaded FSRHs with different residual connections were investigated to determine an effective one. GASA sequentially computes and concatenates the correlation features of an atomic subject and other subjects to effectively discover high-level semantic information. In ablation studies, extensive experiments were conducted to demonstrate the superior performance of the proposed DNN-1 and DNN-2 on five challenging video datasets of JHMDB-21, UCF101-24, traffic police, Charades, and AVA. Notably, the proposed DNN-1 shows state-of-the-art performance on UCF101-24 and TP, and DNN-2 exhibits state-of-the-art performance on AVA, and the second best on Charades, to the best of our knowledge. Therefore, the DNN-1 and DNN-2 proposed herein can be outstanding context-aware engines for various video understanding applications.

في هذا العمل، نقترح شبكتين عصبيتين عميقتين، DNN -1 و DNN -2، استنادًا إلى الطريق السريع البطيء المكرر (FSRH) والاهتمام المكاني الذري العالمي (GASA) للتعرف على الإجراءات واكتشافها بشكل فعال. يتضمن DNN -1 المقترح شبكة عصبية التفافية ثلاثية الأبعاد (3DCNN)، و FSRH المتبقي (R_FSRH)، وطبقة اختزال، وطبقة تصنيف للتعرف على الإجراء. في الكشف العملي عن استخراج وتصنيف المنطقة الخاضعة، يتكون DNN -2 المقترح من 3DCNN، وشبكة اقتراح المنطقة، و R_FSRH، و GASA، وطبقة تحديد موقع التصنيف. تأخذ 3DCNN الطبقة الأمامية إلى طبقة "Mixed -3c" من شبكة Inflated 3D (I3D) المدربة مسبقًا كهيكل أساسي. يتكون FSRH من وحدتي طريق سريع مكرر (RH) لاستخراج زوج من الميزات من الإجراءات السريعة والبطيئة، حيث يتمتع RH باهتمام زمني من التفاف ثلاثي الأبعاد غير محلي وتحويل خطي من خلال بداية ثنائية الخط زمنية. في R_FSRH، تم التحقيق في العديد من FSRHs المتتالية مع وصلات متبقية مختلفة لتحديد واحدة فعالة. تحسب GASA بالتتابع وتسلسل ميزات الارتباط للموضوع الذري والموضوعات الأخرى لاكتشاف المعلومات الدلالية عالية المستوى بشكل فعال. في دراسات الاستئصال، أجريت تجارب مكثفة لإثبات الأداء المتفوق لـ DNN -1 و DNN -2 المقترحين على خمس مجموعات بيانات فيديو صعبة لـ JHMDB -21 و UCF101 -24 وشرطة المرور و Charades و AVA. والجدير بالذكر أن DNN -1 المقترح يظهر أحدث أداء على UCF101 -24 و TP، و DNN -2 يعرض أحدث أداء على AVA، وثاني أفضل أداء على Charades، على حد علمنا. لذلك، يمكن أن تكون DNN -1 و DNN -2 المقترحة هنا محركات رائعة مدركة للسياق لمختلف تطبيقات فهم الفيديو.

Keywords

Artificial neural network, Artificial intelligence, highway network, Class (philosophy), residual network, Organic chemistry, Convolutional neural network, inception network, Pattern recognition (psychology), Action recognition, Anomaly Detection in High-Dimensional Data, Layer (electronics), Deep Learning, Convolution (computer science), Artificial Intelligence, Object Detection, FOS: Mathematics, Neural Network Architectures, action recognition, Pure mathematics, Affine transformation, Computer science, TK1-9971, Algorithm, 3D CNN, Chemistry, Action Recognition, Human Action Recognition and Pose Estimation, Residual, Computer Science, Physical Sciences, Feature extraction, Deep Learning in Computer Vision and Image Recognition, Electrical engineering. Electronics. Nuclear engineering, Computer Vision and Pattern Recognition, High-Dimensional Data, attention mechanism, Mathematics

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    17
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Top 10%
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Top 10%
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Top 10%
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
17
Top 10%
Top 10%
Top 10%
gold