Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2023
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2023
License: CC BY
Data sources: Datacite
versions View all 3 versions
addClaim

[Datasets] Evaluación de la capacidad predictiva de modelos de aprendizaje supervisado para la clasificación de pacientes con cáncer colorrectal

Authors: Roman-Naranjo, Pablo;

[Datasets] Evaluación de la capacidad predictiva de modelos de aprendizaje supervisado para la clasificación de pacientes con cáncer colorrectal

Abstract

DATASETS INFO Dataset on colorectal cancer and hydroxymethylation levels ready to be used in machine learning algorithms. This dataset was generated using data from Walker NJ, Rashid M, Yu S, et al. [Dataset] Hydroxymethylation profile of cell free DNA is a biomarker for early colorectal cancer. Accessed May 17, 2023. https://zenodo.org/record/5170265#.ZGSpgHZBxD-. ABSTRACT Colorectal cancer (CRC) is the second most common cause of cancer death, accounting for 9.5% of all cancer deaths. In addition to patient age, other potential risk factors should be considered to correctly identify the target population for CRC screening programmes. The identification of these risk factors would allow a personalised and accurate approach for each patient that would help improve the survival rate. Thus, the main objective of this study was to identify useful risk biomarkers for the early detection of CRC using machine learning algorithms. For this purpose, we compared the predictive ability of different supervised machine learning models, such as gradient boosting, support vector machines (SVM) or random forest, using a public dataset on hydroxymethylation levels in the enhancer regions in CRC patients, AAR and controls. In addition, we evaluated the suitability of K-means for the identification of CRC patient subgroups using this dataset. The results of this work suggested that the best supervised model to differentiate CRC patients from controls, using hydroxymethylation data, was a SVM model with linear kernel, whose sensitivity was 58% after setting the specificity to 95%, improving the model presented in the article from which the dataset was extracted. In addition, enhancers that regulate the expression of genes such as MYSM1 or SP1, or those that regulate genes encoding proteins involved in pathways such as TGF-β and integrin pathways, were identified as the most relevant enhancers when classifying samples into CRC or control. On the other hand, the use of K-means identified 6 clusters among the samples in the hydroxymethylation dataset. Two of these clusters were mainly composed of samples with CCR, however, these clusters were not associated with a specific stage of development, and the differentiation between clusters was not clear, obtaining very close clusters. Thus, we can conclude that hydroxymethylation data were useful for the identification of CRC biomarkers, obtaining promising results by supervised machine learning approaches. However, these results should be interpreted as preliminary, requiring validation in an external cohort and molecular analysis of the biomarkers identified.

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 10
    download downloads 13
  • 10
    views
    13
    downloads
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
download
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
downloads
OpenAIRE UsageCountsDownloads provided by UsageCounts
0
Average
Average
Average
10
13
Related to Research communities
Cancer Research