Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2025
License: CC BY
Data sources: ZENODO
ZENODO
Dataset . 2026
License: CC BY
Data sources: Datacite
ZENODO
Dataset . 2026
License: CC BY
Data sources: Datacite
ZENODO
Dataset . 2025
License: CC BY
Data sources: Datacite
versions View all 3 versions
addClaim

Tree reconstruction guarantees from CRISPR-Cas9 lineage tracing data using Neighbor-Joining

Authors: Prillo, Sebastian;

Tree reconstruction guarantees from CRISPR-Cas9 lineage tracing data using Neighbor-Joining

Abstract

# Tree reconstruction guarantees from CRISPR-Cas9 lineage tracing data using Neighbor-Joining Our simulated trees paired with lineage tracing data encompass a large number of lineage tracing regimes, which are used to assess the performance of our distance correction method. With respect to version 1 of the dataset, this version increases the number of trees (i.e. repetitions) from 50 to 250, and also includes trees with noisy character matrices which allow probing the robustness of algorithms to technical effects such as sequencing errors; please see below for more details. ## Description of the data and file structure For each lineage tracing regime, 250 simulations are performed. All trees have exactly 400 leaves, and were simulated as described in the manuscript. The `default' regime consists of: 40 characters. mutation rate adjusted to obtain an expected 50% mutated entries in the character matrix. 100 indel states. 20% missing data, with 10% coming from heritable epigenetic silencing and 10% coming from sequencing dropouts. (This does not include missing data further introduced by double-resection events, which we also simulate.) Each lineage tracing regime is obtained by perturbing this 'default' lineage tracing regime by varying one of the above parameters. Specifically, we consider varying: * number of characters (a.k.a. barcodes) in the set {10, 20, 40, 60, 90, 150} (with 40 being the default) * number of states in the set {5, 10, 25, 50, 100, 500, 1000} (with 100 being the default) * expected proportion mutated in the set {10%, 30%, 50%, 70%, 90%} (with 50 being the default) * percent missing from epigenetic silencing and sequencing dropouts in the set {0%, 10%, 20%, 30%, 40%, 50%, 60%}, with the percent coming from sequencing dropouts fixed to 10% (except when the total is 0%, in which case it is set to 0%) * we also include simulations with noise in the character matrix, which probe the algorithm's robustness to effects such as sequencing errors and other artifacts. To simulate this noise, for each entry $X_{ij} \ge 1$ in the character matrix, with probability $p$ we replace it by some other random state uniformly in the set $\{1, 2, \dots, \text{number\_of\_states}\} - \{X_{ij}\}$. We call $p$ the "sequencing error fraction". We vary $p$ in the set {0, 0.001, 0.003, 0.01, 0.03, 0.1} (with 0 being the default). The data from each simulation is stored specifying the parameter that was varied, so for example the simulated data when the number of barcodes is 30 is stored under "trees/number_of_cassettes/30/" . In this directory, for each repetition, we have three files: * tree_{repetition}_character_matrix.csv : Contains the lineage tracing data in csv format. * tree_{repetition}_newick.txt : Contains the tree in newick format, with branch lengths. * tree_{repetition}_CassiopeiaTree.pkl : Contains the pickled CassiopeiaTree object from the simulation, which in particular contains the fitness of different nodes in the tree, ancestral lineage tracing barcodes, etc.. It is not necessary for reproducing any of our results, but we provided in case it is convenient. ## Code/Software We have additionally open-sourced a repository allowing seamless reproduction of all results in our paper, here: https://github.com/songlab-cal/nj-theory

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average