Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Article . 2020
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Article . 2020
License: CC BY
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Article . 2020
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Article . 2020
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Article . 2020
License: CC BY
Data sources: ZENODO
versions View all 3 versions
addClaim

Pathway-extended multigene expression signatures of chemotherapy responses to tyrosine kinase inhibitors: supporting data and program code

Authors: A.J. Bagchee-Clark; Eliseos J Mucaki; Tyson Whitehead; Peter K Rogan;

Pathway-extended multigene expression signatures of chemotherapy responses to tyrosine kinase inhibitors: supporting data and program code

Abstract

This Zenodo archive is associated with the research article, "Pathway-extended multigene expression signatures of chemotherapy responses to tyrosine kinase inhibitors," which is submitted for publication. In particular, it provides the source code, compiled versions, example inputs and output for several programs used in this paper. This archive also provides all database files used in this study (drug sensitivity data, gene expression and copy number data, etc.). 'MFAPreselection', is a novel software program that implements the network based search algorithm of biochemical pathways to extend machine learning based gene signatures derived from curated, peer-reviewed sources. It' is a Haskell-based program which was designed to perform Multiple Factor Analysis against biochemical data (e.g. gene expression and copy number data) of a pre-selected set of genes and drug sensitivity data from cell lines (e.g. GI50s) in order to identify those genes which show a direct or inverse correlation to drug response. The program performs "pathway extension", where genes which are biologically-related to the initial gene set (those which passed set MFA correlation thresholds) are also evaluated by MFA. This extension process can repeated to evaluate those genes that are related to the first set of expanded genes, and so on. When completed, the program will output a list of genes which passed the set MFA correlation angle threshold (i.e. the "angle cutoff"), provide the correlation angle between drug sensitivity and the gene expression / copy number of this gene, indicate which expansion step it was matched in, and briefly describe how this gene is related to (at least one of) the original pre-selected genes. 1. "MFAPreselection-and-Library-Files.zip" This archive contains the compiled version of MFAPreselection, all required library files, and an example configuration file. MFAPreselection was designed to be run on SHARCNET, a network of multiple high performance Unix-based supercomputers. MFAPreselection is run by simply invoking the program ("./MFAPreselection"), which then reads instructions from the "config.txt" file which should be located in the same folder. This configuration file is tab-delimited and has the following structure: drug Drugname genesInitial GENE1 GENE2 ... (tab-delimited set of initial genes associated with drug) aliasesFile ./Path-To-Data-Files/GeneNames-Association.Pseudonyms.txt relationsFile ./Path-To-Data-Files/PathwayCommons.OneNode.InteractionFile.txt.sif gi50sFile ./Path-To-Data-Files/GI50-Data.txt copiesFile ./Path-To-Data-Files/CopyNumber-Data.txt expressionsFile ./Path-To-Data-Files/GeneExpression-Data.txt angleCutoff 10 stepsCutoff 2 circleOutput True mfaInput False mfaOutput True svmInput True aliasOutput True Where 'angleCutoff ' is the maximum MFA correlation angle for a gene to be considered correlated to GI50, 'stepsCutoff' is the maximum gene associated distance allowed by the program (e.g. '1' means MFAPreselection will look for genes related to your input gene set, '2' means it will also look for genes related to those genes found in '1'), "circleOutput" sets the program to generate MFA circle plots (this can add significant time to each run), "mfaInput" sets the program to create a file containing the GI50 / expression / copy number input data, "mfaOutput" generates a file called "MFA.tsv" which reports the correlation angle of GI50 to expression and copy number for all genes analyzed (also provides the "step" of the gene, and its relation to the initial gene set), "svmInput" generates files with GI50, gene expression and copy number data organized in a particular format for our machine learning programs, and "aliasOutput" generates a file which reports all events where a gene alias was used. 2. "MFAPreselection-Data-Files.zip" This folder contains all database files used in the study first describing 'MFAPreselection'. This includes drug sensitivity data (GI50s), gene expression and copy number data, gene pseudonym associations file, and the interactions file. A description of each file is given below: "PathwayCommons.OneNode.InteractionFile.txt.sif" This file contains associations for all genes from PathwayCommons. Two examples of interactions in file: "A1BG controls-expression-of A2M A1BG interacts-with ABCC6" "GeneNames-Association.Pseudonyms.txt" A file (from genecards.org) which contains a list of official gene names (second column) and gives a list of their older pseudonym/aliases (ninth column; multiple aliases are pipe delimited). "All Gene Expression Data.txt" and "All Copy Number Data.txt" These files contain all gene expression and copy number values computed by Daemen et al. (2013). Rows are genes, columns are the cell line names. "GI50-Data.txt" This file consists of a table with all GI50 values for all of the cell lines tested (from Daemen et al., 2013). Rows are the cell line names, and columns are the GI50 values. Cell lines without GI50s for a particular drug appear as 'N/A', and will be skipped by MFAPreselection. 3. "MFAPreselection-Source-Code.zip" This folder contains the source code (written in the Haskell programming language) for the program MFAPreselection. The 'Documentation' folder contains a README file (MS-Word) describing the program in added detail, and a diagram (pdf) showing how data flow is performed within the program. 4. "Automated-regularValidation_multiclassSVM-Job-Submitter-and-Data-Organizer.v2.zip" This folder contains multiple programs (for both the Perl and MatLab programming languages) that were used to perform traditional validation of multiple PE high performance models (derived for an individual cancer drug) within a command-line environment. Contents include example input data files necessary to run these programs, as well as documentation that describes the function of each program, the input files they require, and the output they provide. This archive also includes "Parentage-MFA-Path-Source-Program.Simple-Output-Version.pl", which finds spurious associations made by MFAPreselection due to conflicting gene pseudonyms. Please note that these programs were designed for the SHARCNET high-performance supercomputer, which uses the Slurm Workload Manager to handle job submissions. These programs may require some modifications to work on other types of systems. 5. "Ensemble-Averaging-of-Predictions-By-regularValidation_multiclassSVM.zip" The provided Perl / MatLab hybrid program was written to perform Ensemble machine learning-based averaging of multiple PE high performance models derived for an individual cancer drug. The program requires the output from the model validation program "regularValidation_multiclassSVM.m", first described in Zhao et al., 2018 and made available in a separate Zenodo archive. One could also use the output from the validation programs provided in this archive (4. "Automated-regularValidation_multiclassSVM-Job-Submitter-and-Data-Organizer.zip"). This folder provides example input data files necessary to run the program, as well as the documentation file "Description-of-Ensemble-Averaging-Program.docx" which describes the program (including the contents of these required input files). This program was also designed for the SHARCNET high-performance supercomputer, which uses the Slurm Workload Manager to handle job submissions.

Related Organizations
Keywords

Machine Learning, Support Vector Machine, Gene Signatures, Validation, Biochemical pathways, Tyrosine Kinase Inhibitors, Systems biology, Molecular Diagnostics

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 3
  • 3
    views
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
0
Average
Average
Average
3
Green
Related to Research communities
Cancer Research