
This dataset consists of 34269178 network packet samples extracted from the IoT-Zoo testbed. It represents a 28800-second execution of a heterogeneous IoT environment, featuring 43 distinct device profiles spanning Urban Observatory, Industrial, e-Health, and Smart Farming domains. Technical Specifications The dataset is the result of a synchronized fusion between two network analysis engines (Scapy and Tshark), providing a high-dimensional view of each packet. Unlike flow-based datasets, this is a packet-level collection, where each row represents an individual network frame. Dataset Characteristics Total Samples: 34269178 packets. Total Features: 17 columns. Trace Duration: 28800 seconds. Device Heterogeneity: Covers telemetry from multiple domains with preserved temporal dynamics. Application Semantics: Includes structured payloads (JSON/XML) replayed from real-world datasets. Column Definitions (Schema) pkt_index: Unique sequential identifier for each packet. ip_ttl: time to live value for the ip header, decreases by 1 at each router. tcp_seq: TCP sequence number, used to identify the sequence of tcp segments. tcp_flags_str: Human-readable TCP flag mnemonics (e.g., PA, S, A) extracted via Scapy. frame.time_epoch: High-precision Unix timestamp of arrival. frame.len: The total length of the Ethernet frame in bytes. ip_src / ip_dst: Source and Destination IPv4 addresses. ip_proto: Layer 3 protocol identifier (e.g., 6 for TCP). tcp.src_port / dst_port: Layer 4 source and destination ports (e.g., 1883 for MQTT). tcp_flags_hex: Raw TCP flags in hexadecimal format (0x00000000), optimized for numerical Machine Learning input. _ws.col.protocol: Application layer protocol identified via Tshark's deep packet inspection (e.g., MQTT, NTP, DNS, RTSP). mqtt.topic: Represents the publish/subscribe channel for MQTT messages, representing the origin topic and device. Only populated for MQTT packets; empty otherwise mqtt.msgtype: MQTT message type. mqtt.qos: MQTT Quality of Service level goes from 0 to 2. mqtt.len: Length of MQTT payload in bytes. Intended Use This CSV is ready for downstream Machine Learning tasks such as: Anomaly Detection: Using frame_len and time_epoch (IAT) to identify volumetric or timing-based attacks. Protocol Classification: Leveraging app_protocol and tcp_flags_hex for identifying IoT-specific behaviors. Security Research: Serving as a baseline for legitimate IoT traffic patterns in heterogeneous environments.
