Powered by OpenAIRE graph
Found an issue? Give us feedback
ZENODOarrow_drop_down
ZENODO
Dataset . 2025
License: CC BY NC ND
Data sources: Datacite
ZENODO
Dataset . 2025
License: CC BY NC ND
Data sources: Datacite
versions View all 2 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

Mobility Networked Time Series Benchmark Datasets

Authors: Na, Jihye; Nam, Youngeun; Lee, Jae-Gil; Yoon, Susik; Song, Hwanjun; Lee, Byung Suk;

Mobility Networked Time Series Benchmark Datasets

Abstract

Overview Human mobility is crucial for urban planning (e.g., public transportation) and epidemic response strategies. However, existing research often neglects integrating comprehensive perspectives on spatial dynamics, temporal trends, and other contextual views due to the limitations of existing mobility datasets. To bridge this gap, we introduce MOBINS (MOBIlity Networked time Series), a novel dataset collection designed for networked time-series forecasting of dynamic human movements. MOBINS features diverse and explainable datasets that capture various mobility patterns across different transportation modes in four cities and two countries and cover both transportation and epidemic domains at the administrative area level. Our experiments with nine baseline methods reveal the significant impact of different model backbones on the proposed six datasets. We provide a valuable resource for advancing urban mobility research, and our dataset collection is available at DOI 10.5281/zenodo.14590709. Benchmark Code Go to Github: https://github.com/kaist-dmlab/MOBINS Benchmark Baseline List Linear-based: DLinear, NLinear RNN-based: SegRNN Transformer-based: Informer, Reformer, PatchTST CNN-based: TimesNet GNN-based: STGCN, MPNNLSTM Detailed Benchmark Results There is MOBINS_Results.pdf in the Github Link, the detailed benchmark results of MOBINS were reported with MAE, MSE, and standard deviation. Code Licence Our code implementation is released under the MIT License Code Reference DLinear: https://github.com/cure-lab/LTSF-Linear NLinear: https://github.com/cure-lab/LTSF-Linear SegRNN: https://github.com/lss-1138/SegRNN Informer: https://github.com/zhouhaoyi/Informer2020 Reformer: https://github.com/lucidrains/reformer-pytorch PatchTST: https://github.com/yuqinie98/PatchTST TimesNet: https://github.com/thuml/TimesNet STGCN: https://github.com/hazdzz/STGCN MPNNLSTM: https://github.com/geopanag/pandemic_tgnn Benchmark Datasets Dataset Descriptions Dataset Locations Spatial node units Edges Domain Daily Movements Daily Amounts Time interval Time Range Frames Target dimension Transportation Seoul 128 290 Station-based administrative area SmartCard:2.68M In/Out-flow:4.02M 1 hour 01/01/2022-12/31/2023 17520 16640 Busan 60 121 Station-based administrative area SmartCard:0.63M In/Out-flow:0.75M 1 hour 01/01/2021-12/31/2023 26280 3720 Daegu 61 123 Station-based administrative area SmartCard:0.10M In/Out-flow:0.34M 1 hour 01/01/2021-12/31/2023 26280 3843 NYC 5 12 Borough Taxi:0.10M Ridership:3.03M 1 hour 02/01/2022-03/31/2024 17280 30 Epidemic Korea 16 45 City&Province SmartCards:13.41M Infection:25834 1 day 01/20/2020-08/31/2023 1320 272 NYC 5 12 Borough Taxi:2418 Infection:2038 1 day 03/01/2020-12/31/2023 1401 30 Formats of datasets (MOBINS.zip) csv format datasets in every environment: each dataset has three components. SPATIAL_NETWORK.csv: ( n∗n where n = # of nodes ) Column name list: INDEX, N0, N1, …, Nn INDEX list: N0, N1, …, Nn NODE_TIME_SERIES_FEATURES.csv: ( t * p ) * ( n * d ) where t = # of timestamps in a day, p = total period, and d = # of variables from time series Column name list: datetime, N0 _{VARIABLE_NAME}, N1 _{VARIABLE_NAME}, …, Nn _{VARIABLE_NAME} VARIABLE_NAME list: Transportation-[Seoul, Busan, Deagu]} datasets (INFLOW, OUTFLOW), Transportation-NYC dataset (RIDERSHIP), Epidemic-[Korea, NYC] dataset (INFECTION) OD_MOVEMENTS.csv: ( t * p ) * ( n, n ) Column name list: N0 _ N0, N0 _ N1, N0 _ N2, … , Nn _ Nn−1 , Nn _ Nn Meta datasets In the Github Link, there is metadata for MOBINS_Meta.pdf. Metadata for Transportation Datasets Each file contains information about a single node or a node pair, which is abstracted for simplicity by describing only the i-th node. We omit the detailed description in metadata for Transportation-[Busan, Daegu] because the CSV file structures are identical to the metadata for Transportation_Seoul, differing only in the number of nodes, which is unique to each dataset. Transportation_NYC follows a similar structure, with the exception of the variable for node time-series features (ridership). Metadata for Epidemic Datasets Each file contains information about a single node or a node pair, which is abstracted for simplicity by describing only the i-th node. Both datasets share a consistent structure in terms of node time-series features, OD movements, and spatial networks. Data Licence The Transportation-[Seoul, Busan, Daegu, NYC] and Epidemic-NYC datasets are released under a CC BY-NC 4.0 International License. The Epidemic-Korea datasets are released under a CC BY-NC-ND 4.0 International License. How to Curate MOBINS Composition The MOBINS dataset collection consists of mobility networked time-series data for forecasting tasks in two domains: Transportation-[Seoul, Busan, Daegu, NYC] and Epidemic-[Korea, NYC]. Each dataset comprises three key components: (1) OD movements, (2) a spatial network, and (3) time series. These datasets capture the temporal evolution of OD movements and time series within a fixed spatial network. OD movements represent the volume of movements between pairs of nodes, while time series denotes the time-varying features within each node. These datasets provide a comprehensive understanding of mobility patterns, exhibiting high correlation and synergy between OD movements and time series. Collection Process All datasets in the MOBINS are collected from reliable sources, including government agencies, local governments, public transportation operators, and smart card companies. These sources provide publicly accessible data downloads based on their administrative systems. The source data from smart transit card information systems is accessed through API calls at the administrative area level, such as neighborhoods or provinces, to align the spatial resolution of the time series. The use of data available on the Korea Public Data Portal is either unrestricted or covered by the CC BY license. For sources without a specific license indication, we obtained responses about the uses for research through inquiries via phone or email. Additionally, data from the Korea Disease Control and Prevention Agency was used without numerical value modifications after obtaining permission. Preprocessing/Cleaning/Labeling Each dataset in the MOBINS collection is derived from different sources for OD movements and time series. To ensure consistent spatial and temporal resolution, we align these two sources using Python. In the Transportation-[Seoul, Busan, Daegu] datasets, we use 'station-based administrative areas' as spatial node units, treating stations within the same administrative area as a single node. For the Transportation-NYC dataset, we use boroughs as spatial node units to align the spatial resolution between taxi zones and stations. In the Epidemic-Korea dataset, the source infection case data is collected at the city and province levels. Hence, we use OD movements based on the city and province levels to match spatial resolution. Similarly, for the \emph{Epidemic-NYC} dataset, we use corresponding OD movements at the borough level to maintain consistent spatial node units. After the spatial resolutions are determined, we generate the spatial network based on these resolutions. Regarding the temporal aspect, although the source frequency of OD movements from Transportation-[Busan, Daegu, NYC] is less than 15 minutes, we set the frequency to 1 hour in the MOBINS to match the time-series data frequency. This integration of double sources with positive or negative correlations enables the interpretation and forecasting of data from various contextual perspectives. Among our dataset collection, the source OD movements of the Transportation-Seoul dataset have 14 missing days (07/01/2022 -- 07/06/2022, 07/13/2022, 07/20/2022, 08/06/2022, 08/07/2022, 09/13/2022, 10/31/2022, 11/01/2022, and 12/04/2022) in the Korea Public Data Portal. These missing days are filled with additional OD movement information from the smart transit card information system. Meanwhile, source OD movements from the NYC taxi dataset contain abnormal taxi records. To provide clean NYC OD movements, we remove abnormal taxi records if the difference between drop-off and pick-up timestamps is less than 0 seconds or more than 6 hours for each record. To facilitate future data updates, we maintain backups of the raw source data. Data Reference References of Origin-Destination Movements Transportation-Seoul: Korea Public Data Portal and Smart Transit Card Information System Transportation-[Busan,Daegu]: Smart Transit Card Information System Transportation-NYC: NYC Taxi and Limousine Commission(TLC) Epidemic-Korea: Smart transit card information system Epidemic-NYC: NYC Taxi and Limousine Commission(TLC) References of Time Series Transportation-Seoul: Korea Public Data Portal (Seoul subway line 1-8 and line 9) Transportation-[Busan,Daegu]: Korea Public Data Portal (Busan and Daegu) Transportation-NYC: NYC Data Portal Epidemic-Korea: Korea Disease Control and Prevention Agency Epidemic-NYC: NYC Health [note] All source websites support the official English version except Smart Transit Card Information System and Korea Disease Control and Prevention Agency. Therefore, we write down how to contact or use two source datasets. Uses of Smart Transit Card Information System: Please contact this email (stcis@kotsa.or.kr). Time Series of Epidemic-Korea: direct download link. If you want to contact the reference, please use this official English link. 7. Code Reference we implemented our benchmark code based on Time Series Library (TSLib) . DLinear: https://github.com/cure-lab/LTSF-Linear NLinear: https://github.com/cure-lab/LTSF-Linear SegRNN: https://github.com/lss-1138/SegRNN Informer: https://github.com/zhouhaoyi/Informer2020 Reformer: https://github.com/lucidrains/reformer-pytorch PatchTST: https://github.com/yuqinie98/PatchTST TimesNet: https://github.com/thuml/TimesNet STGCN: https://github.com/hazdzz/STGCN MPNNLSTM: https://github.com/geopanag/pandemic_tgnn Citation @inproceedings{na2025mobility, title={Mobility Networked Time Series Benchmark Datasets}, author={Na, Jihye, and Nam, Youngeun, and Yoon, Susik and Song, Hwanjun and Lee, Byung Suk and Lee, Jae-Gil}, booktitle={ICWSM}, year={2025}, } Acknowledgement This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (Ministry of Science and ICT) (No. 2023R1A2C2003690).

  • BIP!
    Impact byBIP!
    citations
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
citations
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average