
doi: 10.7302/26893
Pedestrians are some of the most vulnerable road users, facing a high risk of injury in vehicle-pedestrian collisions. To make safe autonomous driving a reality, accurately detecting pedestrians with real-world sensor data is crucial. In this thesis, I propose a series of deep learning–based algorithms aimed at improving pedestrian safety by analyzing real-world time-series data. In part one, I detect non-walking pedestrian activities by explicitly incorporating temporal information through pedestrian trajectories, and in part two, I perform pedestrian detection by leveraging sequences of multimodal images taken at different time steps to capture motion patterns and temporal context, which implicitly incorporates temporal information. The first part of this work introduces two novel unsupervised approaches for detecting walking and non-walking events using anomaly detection. In this context, non-walking pedestrians—such as running, biking, or skateboarding—are anomalous because they often cover longer distances over short periods, increasing the risk of collisions. Therefore, accurately identifying non-walking pedestrians is critical for ensuring pedestrian safety. Both methods follow three main steps: (1) pedestrian detection, (2) trajectory prediction, and (3) anomaly detection. For pedestrian detection, pedestrians are identified in RGB camera images by extracting either bounding boxes or human pose coordinates, which are then used to construct their trajectories. Next, for trajectory prediction, the predictor is trained exclusively on walking data and forecasts future pedestrian movements based on observed trajectories. Finally, for anomaly detection, I compare predicted and actual trajectories: large prediction errors indicate anomalous (non-walking) behavior, allowing for fully unsupervised activity detection. Both approaches achieve competitive results. The second part of this work addresses pedestrian detection in both daytime and nighttime conditions. While RGB cameras perform effectively in well-lit environments, their performance deteriorates significantly in low-light or nighttime scenarios. However, thermal imagery captures the heat signatures of pedestrians, enabling detection in challenging lighting conditions. This section proposes two multimodal methods that fuse sequences of thermal and visible images. Both methods leverage image sequences rather than single frames, which improves detection accuracy of heavily occluded pedestrians. A key challenge in multimodal fusion is the spatial misalignment between thermal and visible image pairs, which can degrade performance if not properly addressed. The first method, MambaST, assumes well-aligned image sequences and focuses on real-time fusion of thermal and visible images for pedestrian detection. The second method, Strip-Fusion, inspired by multi-layer perceptrons, is robust to well-aligned and misaligned sequences and uses Kullback–Leibler divergence loss to encourage the feature distribution of the less reliable modality to resemble the more reliable modality. Followed by a post-processing step, which effectively ensures that a pedestrian is detected in both thermal and visible detection heads. Experimental results demonstrate that our methods achieve competitive performance on the well-aligned and misaligned datasets, highlighting the effectiveness and adaptability of our proposed approaches. The first part of the thesis distinguishes between walking and non-walking pedestrians using visible images. It can be generalized to detect anomalous activities, which is a core problem in vision. This work extends the current literature on using prediction-based methods for anomaly detection and can benefit safety-critical applications such as autonomous driving, surveillance, and human-robot interaction. The second part of the thesis uses both visible and thermal images to improve pedestrian detection in challenging lighting conditions. This work extends the current literature and proposes new efficient and robust approaches to address spatial-temporal multispectral pedestrian detection.
Pedestrian Detection, Engineering, Low Light Conditions, Mechanical Engineering, Computer Science, Engineering (General), FOS: Mechanical engineering, Transportation, Video Anomaly Detection, Trajectory Prediction
Pedestrian Detection, Engineering, Low Light Conditions, Mechanical Engineering, Computer Science, Engineering (General), FOS: Mechanical engineering, Transportation, Video Anomaly Detection, Trajectory Prediction
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
