
Abstract In recent years, the collection of spatio-temporal data that captures human movements has increased tremendously due to the advancements in hardware and software systems capable of collecting person-specific data. The bulk of the data collected by these systems has numerous applications, or it can simply be used for general data analysis. Therefore, publishing such big data is greatly beneficial for data recipients. However, in its raw form, the collected data contains sensitive information pertaining to the individuals from which it was collected and must be anonymized before publication. In this paper, we study the problem of privacy-preserving passenger trajectories publishing and propose a solution under the rigorous differential privacy model. Unlike sequential data, which describes sequentiality between data items, handling spatio-temporal data is a challenging task due to the fact that introducing a temporal dimension results in extreme sparseness. Our proposed solution introduces an efficient algorithm, called SafePath, that models trajectories as a noisy prefix tree and publishes ϵ-differentially-private trajectories while minimizing the impact on data utility. Experimental evaluation on real-life transit data in Montreal suggests that SafePath significantly improves efficiency and scalability with respect to large and sparse datasets, while achieving comparable results to existing solutions in terms of the utility of the sanitized data.
transportation, Smart city, sparse data, 000, Computer Sciences, Trajectory data, Sparse data, Transportation, 004, Differential privacy, smart city, differential privacy, trajectory data
transportation, Smart city, sparse data, 000, Computer Sciences, Trajectory data, Sparse data, Transportation, 004, Differential privacy, smart city, differential privacy, trajectory data
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 48 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
