Cilj ovog rada je pokazati mogućnosti primijene metodologije umjetne inteligencije i strukturnog kauzalnog modeliranja (engl. Structural Causal Model, SCM) s ciljem postizanja znanstvenog doprinosa utvrđivanjem kauzalne funkcionalne zakonitosti bioloških značajki o abiotičkim parametrima. Temeljna zadaća rada je istražiti model SCM za određivanje zavisnosti koncentracije klorofila o fizikalnim značajkama u području sjevernog Jadrana tijekom razdoblja od 1965. do 2015. godine. Eksperimentalni podatci rezultat su dugotrajnog i ekstenzivnog istraživanja u okviru EU projekta “LTER Northern Adriatic Sea” i dostupni su (putem EU znanstvene politike “Open Science”) u velikoj bazi podataka (engl. Big Data), koja sadrži 10 8687 uzoraka s 43 značajke. Predložen je matematički model Bayesove mreže (engl. Bayes Network, BN) kao usmjereni neciklički graf (engl. Directed Acyclic Graph, DAG). Struktura grafa određena je primjenom testa uvjetne nezavisnosti (Hamilton-Schmidtova Conditional Indepedence test, HSCI) s razinom signifikantnosti α = 0,05. SCM model pokazuje da su neposredni kauzalni utjecaji na koncentraciju klorofila: temperatura, salinitet, pH, dušik, fosfor i silicij. Primijenjena je metodologija d-razdvajanja BN grafa sa svrhom blokiranja interferencije (engl. confounding) za procjenu kauzalne funkcionalne zavisnosti bioloških značajki o abiotičkim parametrima. Funkcije kauzalnosti određene su kao rubne razdiobe (engl. marginal distributions) modeliranjem Bayesovom neuronskom mrežom (engl. Bayes Neural Network, BNN). Najveći neposredni negativni kauzalni učinak na klorofil A (Chlorophyll A) ima temperatura (−0,07 μg klorofila A/°C). Utvrđena je pozitivna kauzalna zavisnost između klorofila-A i otopljenog kisika (0,2 mg otopljenog kisika DO2/μg klorofila A). Također je provedena neparametarska usporedna analiza klorofila A i fizikalnih parametara hrvatskog dijela i podataka za cjelokupni sjeverni Jadran. Medijan koncentracije otopljenog kisika u hrvatskom dijelu Jadrana je 5,8 mg O2/l a u sjevernom je 5,5 mg O2/l, dok je medijan temperature u hrvatskom dijelu T = 14,6 °C u odnosu na T = 15,1 °C za sjeverni Jadran. Medijan broja stanica bičaša (Dinoflagellate) je u hrvatskom dijelu Jadrana 3 stanice/l, u odnosu na cijeli sjeverni Jadran, gdje je on od 5 stanica/l. Značajna je razlika u učestalosti i iznosu visokog broja bičaša. Medijani koncentracija klorofila A ne pokazuju značajnu razliku (0,65 i 0,90 μg l–1), ali u sjevernom Jadranu je znatno veći broj uzoraka koji po iznosu signifikantno odstupaju od normalne razdiobe (engl. outliers or hotspots). Utvrđena je i značajna razlika u razdiobi koncentracije silicija s velikim brojem uzoraka s visokim koncentracijama u zapadnom dijelu Jadrana. Primijenjeni su modeli “šume” stabala odlučivanja (engl. random forest) predikcije bioloških značajki na osnovi abiotičkih veličina. Validacije modela provedene su određivanjem relativne postotne pogreške predikcije primjenom simulacije “novih” podataka peterostrukom podjelom baze podataka. Postignute su sljedeće pogreške modela predikcije: za klorofil (engl. chlorophyll) 6,5 %; feopigment (Pheeopigment) 17,4 %; diatomeje (Diatom) 18,8 %; dinoflagelat (Dinoflagellate) 17,4 %; i kokolitifore (Coccolithoophores) 12,1 %. Za svaki od modela utvrđeni su ključni abiotički faktori za procjenu predikcija. The aim of this work was to show possibilities of applied artificial intelligence methodologies and structural causal modelling (“Structural Causal Model”, SCM) with the object of gaining a scientific level contribution to the determination of functional causal dependencies in complex ecological systems. In this work, applied was SCM for the determination of dependencies of chlorophyll concentration on physical and chemical parameters in the northern Adriatic Sea during the period 1965 to 2015. The experimental data are the outcome of the long-term and extensive investigation as a part of the EU project “LTER Northern Adriatic Sea”, and are freely available within the EU Open Science policy. The data are a “Big Data” base with 108 687 samples and 43 descriptors. Proposed is a mathematical model with Bayes network (BN) as a directed acyclic graph (DAG). The model structure was determined by the Hamilton-Schmidt conditional independence test with a significance level of α = 0.05. The SCM model shows that the direct causal variables for chlorophyll concentration are: temperature, salinity, pH, and concentrations of nitrogen, phosphor, and silica. The BN model was adjusted according to d-separation with the objective to block confounding and contra-causal back door interference. The functions of causal dependencies were determined as the marginal distributions with Bayes network models with a single interior layer for interpolation. The most important causal effect was due to temperature (−0.07 μg chlorophyll A/°C). The model predicted reversed positive causality between chlorophyll concentration and dissolved oxygen (0.2 mg DO2/μg chlorophyll A). Also evaluated was nonparametric comparative analysis of chlorophyll and abiotic parameters between Croatian and northern Adriatic Sea (Slovenia and Italy). The comparison was based on median metrics to avoid the pronounced influence of outliers due to hydrodynamic effects. The median concentration of dissolved oxygen in Croatian Adriatic was 5.8 mg O2/l, while in Slovenian and Italian 5.5 mg O2/l, and the median temperature was T = 14.6 °C compared to T = 15.1 °C. There is a significant difference in the abundance of dinoflagellates in Croatia 3 cell/l, while in Slovenia and Italian 5 cells/l. The difference is more pronounced by the number and values of “hot spots” outliers. The difference between chlorophyll concentrations is not significant (0.65 and 0.90 μg l–1); however, the difference in the distribution of the outliers is significant with more frequent and bigger outliers in Italian and Slovenian Adriatic. Also observed was a significant difference in SiO4 distribution, with higher concentrations in the western Adriatic. The random forest RF decision tree models are applied for the development of the predictive models of biological parameters based on abiotic data. The RF models are validated by 5-fold cross-validation. The models have out-of-box mean relative errors of 6.5 % for chlorophyll, photopigment 17.4 %; diatoms 18.8 %; dinoflagellate 17.4 %; and 12.1 % for coccolithophores. For each predictive model determined are the first five most important predictors accounting for 95 % of importance.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.15255/kui.2022.033&type=result"></script>');
-->
</script>
gold |
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.15255/kui.2022.033&type=result"></script>');
-->
</script>
Cilj ovog rada je pokazati mogućnosti primijene metodologije umjetne inteligencije i strukturnog kauzalnog modeliranja (engl. Structural Causal Model, SCM) s ciljem postizanja znanstvenog doprinosa utvrđivanjem kauzalne funkcionalne zakonitosti bioloških značajki o abiotičkim parametrima. Temeljna zadaća rada je istražiti model SCM za određivanje zavisnosti koncentracije klorofila o fizikalnim značajkama u području sjevernog Jadrana tijekom razdoblja od 1965. do 2015. godine. Eksperimentalni podatci rezultat su dugotrajnog i ekstenzivnog istraživanja u okviru EU projekta “LTER Northern Adriatic Sea” i dostupni su (putem EU znanstvene politike “Open Science”) u velikoj bazi podataka (engl. Big Data), koja sadrži 10 8687 uzoraka s 43 značajke. Predložen je matematički model Bayesove mreže (engl. Bayes Network, BN) kao usmjereni neciklički graf (engl. Directed Acyclic Graph, DAG). Struktura grafa određena je primjenom testa uvjetne nezavisnosti (Hamilton-Schmidtova Conditional Indepedence test, HSCI) s razinom signifikantnosti α = 0,05. SCM model pokazuje da su neposredni kauzalni utjecaji na koncentraciju klorofila: temperatura, salinitet, pH, dušik, fosfor i silicij. Primijenjena je metodologija d-razdvajanja BN grafa sa svrhom blokiranja interferencije (engl. confounding) za procjenu kauzalne funkcionalne zavisnosti bioloških značajki o abiotičkim parametrima. Funkcije kauzalnosti određene su kao rubne razdiobe (engl. marginal distributions) modeliranjem Bayesovom neuronskom mrežom (engl. Bayes Neural Network, BNN). Najveći neposredni negativni kauzalni učinak na klorofil A (Chlorophyll A) ima temperatura (−0,07 μg klorofila A/°C). Utvrđena je pozitivna kauzalna zavisnost između klorofila-A i otopljenog kisika (0,2 mg otopljenog kisika DO2/μg klorofila A). Također je provedena neparametarska usporedna analiza klorofila A i fizikalnih parametara hrvatskog dijela i podataka za cjelokupni sjeverni Jadran. Medijan koncentracije otopljenog kisika u hrvatskom dijelu Jadrana je 5,8 mg O2/l a u sjevernom je 5,5 mg O2/l, dok je medijan temperature u hrvatskom dijelu T = 14,6 °C u odnosu na T = 15,1 °C za sjeverni Jadran. Medijan broja stanica bičaša (Dinoflagellate) je u hrvatskom dijelu Jadrana 3 stanice/l, u odnosu na cijeli sjeverni Jadran, gdje je on od 5 stanica/l. Značajna je razlika u učestalosti i iznosu visokog broja bičaša. Medijani koncentracija klorofila A ne pokazuju značajnu razliku (0,65 i 0,90 μg l–1), ali u sjevernom Jadranu je znatno veći broj uzoraka koji po iznosu signifikantno odstupaju od normalne razdiobe (engl. outliers or hotspots). Utvrđena je i značajna razlika u razdiobi koncentracije silicija s velikim brojem uzoraka s visokim koncentracijama u zapadnom dijelu Jadrana. Primijenjeni su modeli “šume” stabala odlučivanja (engl. random forest) predikcije bioloških značajki na osnovi abiotičkih veličina. Validacije modela provedene su određivanjem relativne postotne pogreške predikcije primjenom simulacije “novih” podataka peterostrukom podjelom baze podataka. Postignute su sljedeće pogreške modela predikcije: za klorofil (engl. chlorophyll) 6,5 %; feopigment (Pheeopigment) 17,4 %; diatomeje (Diatom) 18,8 %; dinoflagelat (Dinoflagellate) 17,4 %; i kokolitifore (Coccolithoophores) 12,1 %. Za svaki od modela utvrđeni su ključni abiotički faktori za procjenu predikcija. The aim of this work was to show possibilities of applied artificial intelligence methodologies and structural causal modelling (“Structural Causal Model”, SCM) with the object of gaining a scientific level contribution to the determination of functional causal dependencies in complex ecological systems. In this work, applied was SCM for the determination of dependencies of chlorophyll concentration on physical and chemical parameters in the northern Adriatic Sea during the period 1965 to 2015. The experimental data are the outcome of the long-term and extensive investigation as a part of the EU project “LTER Northern Adriatic Sea”, and are freely available within the EU Open Science policy. The data are a “Big Data” base with 108 687 samples and 43 descriptors. Proposed is a mathematical model with Bayes network (BN) as a directed acyclic graph (DAG). The model structure was determined by the Hamilton-Schmidt conditional independence test with a significance level of α = 0.05. The SCM model shows that the direct causal variables for chlorophyll concentration are: temperature, salinity, pH, and concentrations of nitrogen, phosphor, and silica. The BN model was adjusted according to d-separation with the objective to block confounding and contra-causal back door interference. The functions of causal dependencies were determined as the marginal distributions with Bayes network models with a single interior layer for interpolation. The most important causal effect was due to temperature (−0.07 μg chlorophyll A/°C). The model predicted reversed positive causality between chlorophyll concentration and dissolved oxygen (0.2 mg DO2/μg chlorophyll A). Also evaluated was nonparametric comparative analysis of chlorophyll and abiotic parameters between Croatian and northern Adriatic Sea (Slovenia and Italy). The comparison was based on median metrics to avoid the pronounced influence of outliers due to hydrodynamic effects. The median concentration of dissolved oxygen in Croatian Adriatic was 5.8 mg O2/l, while in Slovenian and Italian 5.5 mg O2/l, and the median temperature was T = 14.6 °C compared to T = 15.1 °C. There is a significant difference in the abundance of dinoflagellates in Croatia 3 cell/l, while in Slovenia and Italian 5 cells/l. The difference is more pronounced by the number and values of “hot spots” outliers. The difference between chlorophyll concentrations is not significant (0.65 and 0.90 μg l–1); however, the difference in the distribution of the outliers is significant with more frequent and bigger outliers in Italian and Slovenian Adriatic. Also observed was a significant difference in SiO4 distribution, with higher concentrations in the western Adriatic. The random forest RF decision tree models are applied for the development of the predictive models of biological parameters based on abiotic data. The RF models are validated by 5-fold cross-validation. The models have out-of-box mean relative errors of 6.5 % for chlorophyll, photopigment 17.4 %; diatoms 18.8 %; dinoflagellate 17.4 %; and 12.1 % for coccolithophores. For each predictive model determined are the first five most important predictors accounting for 95 % of importance.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.15255/kui.2022.033&type=result"></script>');
-->
</script>
gold |
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.15255/kui.2022.033&type=result"></script>');
-->
</script>