Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2026
License: CC BY
Data sources: ZENODO
ZENODO
Dataset . 2026
License: CC BY
Data sources: Datacite
ZENODO
Dataset . 2026
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

SmartHome-IoT-Federated-Anomaly-Dataset

A Real-World Heterogeneous Smart-Home IoT Dataset for Federated Anomaly Detection
Authors: Jarjis, Alend; Becerikli, Yasar;

SmartHome-IoT-Federated-Anomaly-Dataset

Abstract

This dataset presents a real-world heterogeneous smart-home IoT dataset designed to support anomaly detection research under both centralized machine learning (ML) and federated learning (FL) paradigms. The dataset was collected from a residential deployment consisting of four Raspberry Pi devices (Pi-4 to Pi-7), each equipped with different sensor configurations, including temperature, humidity, motion (PIR), accelerometer (ADXL345), and gas sensors (MQ). This heterogeneous setup results in diverse feature spaces and naturally non-independent and non-identically distributed (non-IID) data across devices. The dataset contains over 7 million multivariate time-series records with a sampling interval of approximately 2–2.5 seconds. It captures realistic IoT characteristics, including temporal irregularities, missing values, sensor noise, and highly imbalanced anomaly distributions (~0.6% anomalies). Anomalies were introduced using a controlled marker-based injection framework, simulating real-world conditions such as sensor faults, environmental changes, device stress, and network disturbances. In addition, retroactive anomaly scenarios are provided for selected time periods (e.g., holiday intervals) to support reproducibility and controlled experimentation. A comprehensive integrity-aware data curation pipeline was applied, including timestamp normalization, detection and correction of malformed entries, duplicate removal, anomaly label validation, and audit logging. The final curated dataset ensures high data quality and full traceability. The dataset is released in multiple formats to support different research settings:- A centralized ML-ready dataset- Device-level partitions representing federated learning clients (Pi-4 to Pi-7)- Experimental subsets for lightweight benchmarking Key characteristics:- Real-world IoT deployment- Device-level heterogeneity- Non-IID data distribution- Multivariate time-series structure- Highly imbalanced anomaly classes- Fully reproducible preprocessing pipeline This dataset provides a realistic benchmark for evaluating anomaly detection methods in distributed IoT environments, particularly for federated learning under practical constraints.

Keywords

non-IID data, data curation, machine learning, federated learning, multivariate time series, edge computing, cybersecurity, smart home, Raspberry Pi, IoT dataset, time-series data, anomaly detection, sensor data

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average