publication . Other literature type . Article . 2019

Statistical significance approximation for local similarity analysis of dependent time series data

Yihui Luan; Fengzhu Sun; Fengzhu Sun; Fang Zhang;
Open Access
  • Published: 28 Jan 2019
  • Publisher: Springer Science and Business Media LLC
Abstract
Background Local similarity analysis (LSA) of time series data has been extensively used to investigate the dynamics of biological systems in a wide range of environments. Recently, a theoretical method was proposed to approximately calculate the statistical significance of local similarity (LS) scores. However, the method assumes that the time series data are independent identically distributed, which can be violated in many problems. Results In this paper, we develop a novel approach to accurately approximate statistical significance of LSA for dependent time series data using nonparametric kernel estimated long-run variance. We also investigate an alternative...
Subjects
free text keywords: Methodology Article, Data-driven local similarity analysis, Long-run variance, Nonparametric kernel estimate, Statistical significance, Biochemistry, Applied Mathematics, Molecular Biology, Structural Biology, Computer Science Applications, Computer applications to medicine. Medical informatics, R858-859.7, Biology (General), QH301-705.5, Kernel (linear algebra), Time series, Similarity analysis, Statistical model, False positive paradox, Nonparametric statistics, Pattern recognition, Type I and type II errors, Artificial intelligence, business.industry, business, Computer science
Related Organizations
24 references, page 1 of 2

Faust, K, Lahti, LM, Gonze, D, Vos, WMD, Raes, J. Metagenomics meets time series analysis: unraveling microbial community dynamics. Curr Opin Microbiol. 2015; 25 (12): 56-66 [OpenAIRE] [PubMed] [DOI]

Qian, J, Dolled-Filhart, M, Lin, J, Yu, H, Gerstein, M. Beyond synexpression relationships: local clustering of time-shifted and inverted gene expression profiles identifies new, biologically relevant interactions. J Mol Biol. 2001; 314 (5): 1053-66 [OpenAIRE] [PubMed] [DOI]

Balasubramaniyan, R, Hüllermeier, E, Weskamp, N, Kämper, J. Clustering of gene expression data using a local shape-based similarity measure. Bioinformatics. 2005; 21 (7): 1069-77 [OpenAIRE] [PubMed] [DOI]

Ji, L, Tan, K. Mining gene expression data for positive and negative co-regulated gene clusters. Bioinformatics. 2004; 20 (16): 2711-8 [OpenAIRE] [PubMed] [DOI]

Madeira, SC, Teixeira, MC, Sa-Correia, I, Oliveira, AL. Identification of regulatory modules in time series gene expression data using a linear time biclustering algorithm. IEEE/ACM Trans Comput Biol Bioinform. 2010; 7 (1): 153-65 [PubMed] [DOI]

Beman, JM, Steele, JA, Fuhrman, JA. Co-occurrence patterns for abundant marine archaeal and bacterial lineages in the deep chlorophyll maximum of coastal california. ISME J. 2011; 5 (7): 1077-85 [OpenAIRE] [PubMed] [DOI]

Ruan, Q, Dutta, D, Schwalbach, MS, Steele, JA, Fuhrman, JA, Sun, F. Local similarity analysis reveals unique associations among marine bacterioplankton species and environmental factors. Bioinformatics. 2006; 22 (20): 2532-8 [OpenAIRE] [PubMed] [DOI]

Cram, JA, Xia, LC, Needham, DM, Sachdeva, R, Sun, F, Fuhrman, JA. Cross-depth analysis of marine bacterial networks suggests downward propagation of temporal changes. ISME J. 2015; 9 (12): 2573-86 [OpenAIRE] [PubMed] [DOI]

Steele, JA, Countway, PD, Xia, L, Vigil, PD, Beman, JM, Kim, DY, Chow, CT, Sachdeva, R, Jones, AC, Schwalbach, MS, Rose, JM, Hewson, I, Patel, A, Sun, F, Caron, DA, Fuhrman, JA. Marine bacterial, archaeal and protistan association networks reveal ecological linkages. ISME J. 2011; 5 (9): 1414-25 [OpenAIRE] [PubMed] [DOI]

Gonçalves, JP, Madeira, SC. Latebiclustering: Efficient heuristic algorithm for time-lagged bicluster identification. IEEE/ACM Trans Comput Biol Bioinform. 2014; 11 (5): 801-13 [PubMed] [DOI]

Xia, LC, Steele, JA, Cram, JA, Cardon, ZG, Simmons, SL, Vallino, JJ, Fuhrman, JA, Sun, F. Extended local similarity analysis (eLSA) of microbial community and other time series data with replicates. BMC Syst Biol. 2011; 5 (Suppl 2): 15 [OpenAIRE] [PubMed] [DOI]

Xia, LC, Ai, D, Cram, JA, Fuhrman, JA, Sun, F. Efficient statistical significance approximation for local similarity analysis of high-throughput time series data. Bioinformatics. 2013; 29 (2): 230-7 [OpenAIRE] [PubMed] [DOI]

Durno, WE, Hanson, NW, Konwar, KM, Hallam, SJ. Expanding the boundaries of local similarity analysis. BMC Genom. 2013; 14 (Suppl 1): 3 [DOI]

Xia, LC, Ai, D, Cram, JA, Liang, X, Fuhrman, JA, Sun, F. Statistical significance approximation in local trend analysis of high-throughput time series data using the theory of Markov chains. BMC Bioinformatics. 2015; 16: 301 [OpenAIRE] [PubMed] [DOI]

Smith, TF, Waterman, MS. Identification of common molecular subsequences. J Mol Biol. 1981; 147 (1): 195-7 [OpenAIRE] [PubMed] [DOI]

24 references, page 1 of 2
Abstract
Background Local similarity analysis (LSA) of time series data has been extensively used to investigate the dynamics of biological systems in a wide range of environments. Recently, a theoretical method was proposed to approximately calculate the statistical significance of local similarity (LS) scores. However, the method assumes that the time series data are independent identically distributed, which can be violated in many problems. Results In this paper, we develop a novel approach to accurately approximate statistical significance of LSA for dependent time series data using nonparametric kernel estimated long-run variance. We also investigate an alternative...
Subjects
free text keywords: Methodology Article, Data-driven local similarity analysis, Long-run variance, Nonparametric kernel estimate, Statistical significance, Biochemistry, Applied Mathematics, Molecular Biology, Structural Biology, Computer Science Applications, Computer applications to medicine. Medical informatics, R858-859.7, Biology (General), QH301-705.5, Kernel (linear algebra), Time series, Similarity analysis, Statistical model, False positive paradox, Nonparametric statistics, Pattern recognition, Type I and type II errors, Artificial intelligence, business.industry, business, Computer science
Related Organizations
24 references, page 1 of 2

Faust, K, Lahti, LM, Gonze, D, Vos, WMD, Raes, J. Metagenomics meets time series analysis: unraveling microbial community dynamics. Curr Opin Microbiol. 2015; 25 (12): 56-66 [OpenAIRE] [PubMed] [DOI]

Qian, J, Dolled-Filhart, M, Lin, J, Yu, H, Gerstein, M. Beyond synexpression relationships: local clustering of time-shifted and inverted gene expression profiles identifies new, biologically relevant interactions. J Mol Biol. 2001; 314 (5): 1053-66 [OpenAIRE] [PubMed] [DOI]

Balasubramaniyan, R, Hüllermeier, E, Weskamp, N, Kämper, J. Clustering of gene expression data using a local shape-based similarity measure. Bioinformatics. 2005; 21 (7): 1069-77 [OpenAIRE] [PubMed] [DOI]

Ji, L, Tan, K. Mining gene expression data for positive and negative co-regulated gene clusters. Bioinformatics. 2004; 20 (16): 2711-8 [OpenAIRE] [PubMed] [DOI]

Madeira, SC, Teixeira, MC, Sa-Correia, I, Oliveira, AL. Identification of regulatory modules in time series gene expression data using a linear time biclustering algorithm. IEEE/ACM Trans Comput Biol Bioinform. 2010; 7 (1): 153-65 [PubMed] [DOI]

Beman, JM, Steele, JA, Fuhrman, JA. Co-occurrence patterns for abundant marine archaeal and bacterial lineages in the deep chlorophyll maximum of coastal california. ISME J. 2011; 5 (7): 1077-85 [OpenAIRE] [PubMed] [DOI]

Ruan, Q, Dutta, D, Schwalbach, MS, Steele, JA, Fuhrman, JA, Sun, F. Local similarity analysis reveals unique associations among marine bacterioplankton species and environmental factors. Bioinformatics. 2006; 22 (20): 2532-8 [OpenAIRE] [PubMed] [DOI]

Cram, JA, Xia, LC, Needham, DM, Sachdeva, R, Sun, F, Fuhrman, JA. Cross-depth analysis of marine bacterial networks suggests downward propagation of temporal changes. ISME J. 2015; 9 (12): 2573-86 [OpenAIRE] [PubMed] [DOI]

Steele, JA, Countway, PD, Xia, L, Vigil, PD, Beman, JM, Kim, DY, Chow, CT, Sachdeva, R, Jones, AC, Schwalbach, MS, Rose, JM, Hewson, I, Patel, A, Sun, F, Caron, DA, Fuhrman, JA. Marine bacterial, archaeal and protistan association networks reveal ecological linkages. ISME J. 2011; 5 (9): 1414-25 [OpenAIRE] [PubMed] [DOI]

Gonçalves, JP, Madeira, SC. Latebiclustering: Efficient heuristic algorithm for time-lagged bicluster identification. IEEE/ACM Trans Comput Biol Bioinform. 2014; 11 (5): 801-13 [PubMed] [DOI]

Xia, LC, Steele, JA, Cram, JA, Cardon, ZG, Simmons, SL, Vallino, JJ, Fuhrman, JA, Sun, F. Extended local similarity analysis (eLSA) of microbial community and other time series data with replicates. BMC Syst Biol. 2011; 5 (Suppl 2): 15 [OpenAIRE] [PubMed] [DOI]

Xia, LC, Ai, D, Cram, JA, Fuhrman, JA, Sun, F. Efficient statistical significance approximation for local similarity analysis of high-throughput time series data. Bioinformatics. 2013; 29 (2): 230-7 [OpenAIRE] [PubMed] [DOI]

Durno, WE, Hanson, NW, Konwar, KM, Hallam, SJ. Expanding the boundaries of local similarity analysis. BMC Genom. 2013; 14 (Suppl 1): 3 [DOI]

Xia, LC, Ai, D, Cram, JA, Liang, X, Fuhrman, JA, Sun, F. Statistical significance approximation in local trend analysis of high-throughput time series data using the theory of Markov chains. BMC Bioinformatics. 2015; 16: 301 [OpenAIRE] [PubMed] [DOI]

Smith, TF, Waterman, MS. Identification of common molecular subsequences. J Mol Biol. 1981; 147 (1): 195-7 [OpenAIRE] [PubMed] [DOI]

24 references, page 1 of 2
Any information missing or wrong?Report an Issue