
AbstractObjective: There are various sources that provide data related to tourism. However, at times, this data lacks structure or is found in sources that do not facilitate its easy, automatic, or unsupervised collection. In such situations, a methodology employing data science techniques offers a significant advantage to researchers. They can leverage the tools available through the proposed methodology to extract, process, and analyze information efficiently. While this methodology is applicable to various disciplines, this work presents a specific case focused on tourism in Spain. Methodology: Employing data science techniques like graph analysis and unsupervised machine learning, we collect and process data on tourists’ origins and numbers in Spain, using Python, R, and VOSViewer. The analysis uncovers primary tourism sources and origin-country patterns. It delves deep into Andalusia due to its high tourist influx. Results: Our study reveals key Spanish tourism sources and visitor behavior patterns. Visual data illustrates tourist origins, visit numbers, and interactions. Additionally, Andalusia is thoroughly examined for visit counts and origin countries. Conclusions: Employing data science, our study yields insights into Spanish tourism, identifying core sources and understanding origin-country interactions. These findings inform strategic decisions and enhance Spain's tourism promotion and management.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
