Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Other literature type . 2024
License: CC BY
Data sources: ZENODO
ZENODO
Article . 2020
License: CC BY
Data sources: Datacite
ZENODO
Other literature type . 2024
License: CC BY
Data sources: Datacite
ZENODO
Article . 2020
License: CC BY
Data sources: Datacite
ZENODO
Other literature type . 2024
License: CC BY
Data sources: Datacite
ZENODO
Article . 2020
License: CC BY
Data sources: Datacite
versions View all 5 versions
addClaim

Towards Safe AI: Ensuring Security in Machine Learning and Reinforcement Learning Models

Authors: Shah, Harshal;

Towards Safe AI: Ensuring Security in Machine Learning and Reinforcement Learning Models

Abstract

The rapid integration of artificial intelligence (AI) into critical systems has amplified concerns about its safety and security. Machine Learning (ML) and Reinforcement Learning (RL), while enabling advanced decision-making capabilities, are susceptible to a range of threats, including adversarial attacks, data poisoning, and model exploitation. These vulnerabilities not only compromise system integrity but also pose significant risks in applications such as healthcare, finance, and autonomous systems. This paper explores a comprehensive framework for ensuring the security of ML and RL models, emphasizing proactive and reactive strategies. We begin by identifying common attack vectors in ML and RL, illustrating real-world examples of security breaches. A taxonomy of these threats is presented, categorizing them based on their origin, impact, and detectability. Building on this, the paper highlights cutting-edge techniques for securing AI models, including robust model architectures, adversarial training, differential privacy, and federated learning. The role of explainable AI (XAI) in uncovering potential vulnerabilities is also examined, alongside mechanisms for enhancing model interpretability. Furthermore, the unique challenges posed by RL systems, such as the exploitation of reward mechanisms and policy manipulation, are discussed. Solutions tailored to RL, including dynamic reward shaping and environment-aware defenses, are proposed. The paper also delves into regulatory and ethical considerations, advocating for standardized frameworks and cross-industry collaboration to ensure AI safety. By integrating theoretical insights with practical recommendations, this study provides a roadmap for researchers and practitioners to fortify ML and RL systems against evolving threats. The ultimate goal is to foster trust and resilience in AI technologies, ensuring their safe deployment in diverse domains.

Keywords

Machine Learning, Artificial Intelligence

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Green