
doi: 10.3233/faia250577
Data surrounds us, data-driven decisions are becoming increasingly common, and the applications of the KDD (Knowledge Discovery from Data) process proposed by [7] are becoming crucial in the construction of the new digital society. In the KDD process, the main step is Data Mining, where the data-driven models are really trained and built and many data-driven methodologies can be used there to extract knowledge from data [8] [4], among them, also advanced multivariate techniques like Principal Component Analysis (PCA). In the last years it has become clear that a very relevant step in KDD process is the interpretation of data mining results, especially when KDD is applied to real situations where decision making support is intended and explainable AI has become a new central research field. This paper proposes a methodology for automatic elicitation of the latent topic represented in a principal component. The proposal is based on the automatization of the interpretation process that the multivariate experts follow to interpret the factorial components. It is based on the introduction of a machine readable metadata model that describes data, through which semantic elements can be transferred to the machine to be used for the interpretation. The proposal relies on [2], where regular expressions are used to generate automatic verbal descriptions on the interpretation of a PCA axis, based on the relevant contributions of the numerical variables and modalities projected in the factorial map. Nevertheless, one of the main goals of PCA is to elicit latent variables and determine the main topic represented in each dimension [1] and eventhough the proposal of [2] provides a verbal description of the principal components with success, more research is required to abstract the meaning of the principal component through a term or tag sysnthesizing the verbal description. To address this further contribution to the interpretation process, the machine-readable meta-information model developed in [3] is used to elevate the level of conceptualization provided by the system. This metainformation model contains the necessary information to address our goals. The proposal is applied to education, on a dataset containing scores in basic competencies of children in primary and secondary education. A key conclusion is that academic performance and learning progression emerge as the dominant themes in the first factorial plane, and can be automatically identified using the developed methodology.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
