
We are currently witnessing a tremendous increase in the amounts of data being produced and/or collected by several applications and scientific disciplines. This has been made possible with the recent advances in sensing instruments that allow the observation of different real world phenomena at a scale and granularity that was never possible, which in turn enables data-driven scientific discovery. In other words, we are now in a position to exploit the wealth of data to refine existing, or build new theoretical models that describe and explain various phenomena in the real world. Examples range from environmental and earth sciences (such as seismology, volcanology, oceanography, study of water and ice resources), to astrophysics (such as cosmological networks, and gravitational waves) and smart cities (such as transport, and impact of human activities on the environment), where datasets can often grow to TeraBytes (TB), or even PetaBytes (PB) in size. Extracting knowledge from these data means that we need to perform analysis tasks that are becoming increasingly complex as the amount of data, the number of observed variables, and the levels of noise (e.g., when measuring weak signals) in the measurements grow. Therefore, we are in need of novel methods that can cope with the scale of data (TB to PB) and complexity of tasks (noise removal, detection of weak signals, classification of complex patterns, detection of anomalies/abnormal events) that we face across applications in different domains. In order to address these challenges, we need to turn our attention to a set of new classes of Artificial Intelligence (AI) and Machine Learning (ML) techniques (both supervised and unsupervised, with an emphasis on deep learning), which produce very promising initial results. In this project, we propose to fund 15 PhD positions for interdisciplinary work in the areas mentioned above, for developing novel methods in the intersection of AI/ML and large data management. The objective is to address the needs of real-world applications, by performing complex analytics tasks at scale and to advance the corresponding state of the art solutions. The differentiating factor is the focus on data-intensive problems that address fundamental challenges in modern science, industry, and society.