
Myelodysplastic syndromes have increased in frequency and incidence in the American population, but patient prognosis has not significantly improved over the last decade. Such improvements could be realized if biomarkers for accurate diagnosis and prognostic stratification were successfully identified. In this study, we propose a method that associates two state-of-the-art array technologies--single nucleotide polymor-phism(SNP) array and gene expression array--with gene motifs considered transcription factor-binding sites (TFBS). We are particularly interested in SNP-containing motifs introduced by genetic variation and mutation as TFBS. The potential regulation of SNP-containing motifs affects only when certain mutations occur. These motifs can be identified from a group of co-expressed genes with copy number variation. Then, we used a sliding window to identify motif candidates near SNPs on gene sequences. The candidates were filtered by coarse thresholding and fine statistical testing. Using the regression-based LARS-EN algorithm and a level-wise sequence combination procedure, we identified 28 SNP-containing motifs as candidate TFBS. We confirmed 21 of the 28 motifs with ChIP-chip fragments in the TRANSFAC database. Another six motifs were validated by TRANSFAC via searching binding fragments on co-regulated genes. The identified motifs and their location genes can be considered potential biomarkers for myelodysplastic syndromes. Thus, our proposed method, a novel strategy for associating two data categories, is capable of integrating information from different sources to identify reliable candidate regulatory SNP-containing motifs introduced by genetic variation and mutation.
DNA Copy Number Variations, Genotype, LOCI, VARIANTS, UNIPARENTAL DISOMY, Polymorphism, Single Nucleotide, CHROMOSOMAL-ABNORMALITIES, REGRESSION, Databases, Genetic, Genes, Regulator, Humans, GENOME-WIDE ASSOCIATION, POPULATION, Oligonucleotide Array Sequence Analysis, Binding Sites, Gene Expression Profiling, PROFILES, myelodysplastic syndromes, MICROARRAYS, Association study, genetic variation and mutation, VARIABLE SELECTION, Oncology, Myelodysplastic Syndromes, factor-binding sites, Original Article, transcription, Algorithms, Transcription Factors
DNA Copy Number Variations, Genotype, LOCI, VARIANTS, UNIPARENTAL DISOMY, Polymorphism, Single Nucleotide, CHROMOSOMAL-ABNORMALITIES, REGRESSION, Databases, Genetic, Genes, Regulator, Humans, GENOME-WIDE ASSOCIATION, POPULATION, Oligonucleotide Array Sequence Analysis, Binding Sites, Gene Expression Profiling, PROFILES, myelodysplastic syndromes, MICROARRAYS, Association study, genetic variation and mutation, VARIABLE SELECTION, Oncology, Myelodysplastic Syndromes, factor-binding sites, Original Article, transcription, Algorithms, Transcription Factors
| citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 9 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
