
handle: 2117/426742
The ability to extract valuable information from data is crucial for organizations and individuals who want to remain competitive in a constantly evolving data-driven environment. However, some of them lack the skills required to appropriately leverage the existing data analytics tools and methods. This problem is aggravated when the users are domain-experts but completely unfamiliar with data analytics terminology, as existing assistant tools, such as AutoML or Intelligent Discovery Assistants, require them to state their analytical intent (i.e., the type of data analysis they want to perform). To address this problem, we propose to capture the underlying analytical intent from textual problem descriptions by leveraging Large Language Models (LLMs). To this end, we propose a hierarchical categorization of analytical intents, along with a data collection methodology to obtain analytical problem descriptions for all of them in order to validate different approaches that aim to extract such intents from text. Next, we compare the performance of state-of-the-art approaches with LLMs, and then study the performance of different LLMs based on their characteristics and the impact of the source of validation data. Finally, we develop a prototype to showcase how our method could interact with existing AutoML systems.
Gerard Pons is supported by the EU’s Horizon Programme call, under Grant Agreement No. 101093164 (ExtremeXP), and Besim Bilalli is partially supported by the DOGO4ML project, funded by the Spanish Ministerio de Ciencia i Innovación under the funding scheme PID2020-117191RB-I00/AEI/10.13039/501100011033.
Peer Reviewed
Àrees temàtiques de la UPC::Informàtica::Sistemes d'informació, Large language models, Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Llenguatge natural, Analytical intents, Data science
Àrees temàtiques de la UPC::Informàtica::Sistemes d'informació, Large language models, Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Llenguatge natural, Analytical intents, Data science
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
