
# Synthetic Eco-Evolutionary Dynamics in Simple Molecular Environment Dataset: experimental files for the "oligo1" dataset, in fastq format. ## Description of the data and file structure The name of each file is as follows: RX_06_RY.fastq X: is the round (from 1 to 24) Y: either 1 or 2, for reverse/forward sequences oligo1_R1/R2 files are data relative to cycle 0. ## Sharing/Access information Analysis code and other materials can be also found: here ## Code/Software These codes have been written by Francesco Mambretti (2021-2023). They are meant to analyze experimental FASTQ files from the SEDES experiment. \------------------------------------- REQUIREMENTS ------------------------------------- * `python3` with `numpy, matplotlib, itertools, more_itertools, biopython, pandas, difflib` * `C++` installed and a `C++` compiler supporting (at least) `C++ - 2011` \------------------------------------- input_params.py ------------------------------------- 1. first, modify `input_params.py` setting: first, modify `input_params.py` setting: * `key1`: "oligo1", "oligo2", "negative" or "seriesN" -> decide which dataset (others can be added) * `key2`: "R1", "R2", "R1R2" -> different reading directions (should give similar but not identical results, due to experimental imperfections) * `key3`: "fw", "rev" or "all" -> select only forward/reverse/all sequences * `key\_filter`: True/False -> whether to apply (`True`) or not (`False`) a special criterion to filter data. Default criterion here is to exclude PCR by-products. * `key\_no\_cut`: True/False -> whether to print either full sequences (`True`) or cut sequences, deleting the primer bases (`False`) These options can be either set manually or via an external script such as the `loop.py` included here. To use `loop.py`, edit `input_params.py` and set: `key1="$KEY1$" key2="$KEY2$" key3="$KEY3$" key_filter=$KEY_FILTER$ key_no_cut=$KEY_NO_CUT$` Optionally, other parameters can be modified: * colors of RSA histograms * True/False for creating (or not) abundance histograms for unique strands * min quality of the reads * `l` -> resource length (defaults to 20 bases) * `subset_steps`-> to analyze only the first subset_steps for faster analysis on incomplete datasets * `use_stop` -> decide whether to really do it (`True/False`) * `n` -> number of top-n strands for the related analysis of dominant individuals * `random_seq=50` -> number of random nucleotides, by default; not used, currently, apart from `L` ordinary definition * `cap_size=25` -> size of fixed sequences at the two ends; not really used * `extra_end=1` -> sometimes there is an extra base, old code versions needed it, currently it is ignored * `L=random_seq+cap_size+extra_end` -> max length, with cap and last one - length of predators -> can be edited (e.g. for N series) * `lower_bound=L-6` -> discard strand with less than `lower_bound` bases -> can be edited Another editable parameter is `results_folder`, which can be changed in case one needs to save some results separately. \------------------------------------- compilation ------------------------------------- 1. `make all` to generate C++ executables (C++-17 is used, but C++-11 compatibility should be enough) \------------------------------------- read_fastq.py ------------------------------------- 1. execute `python3 read_fastq.py` which processes the FASTQ files and generates text files and plots with the outcomes of the performed analyses. `read_fastq.py` calls itself: * `find_MCO_serial.x` (executable of the corresponing `C++` code for Maximum Consecutive Overlap calculation between strands - see for its definition and related discussions). * `find_equal_pair.x` detects the number of consecutive identical bases between two strands passed by command line. Based on the same routines of `find_MCO_serial`, simplified version, used to detect aliens * `module_functions.py`: process FASTQ files, filter sequences, sort them by abundance, reverse and complement strands, track the abundance of the top-`n` most abundant ones across cycles and compute their cross-MCO matrix * `main_plot.py`: generate text files and plots for RSA histograms, Shannon entropy associated to them, evolution of top-`n` strands, the fraction of total population covered by top-`n` individuals and the 2D histogram of (MCO,MCO_2nd). It calls `module_plots.py`.
The understanding of eco-evolutionary dynamics, and in particular the mechanism of emergence of species, is still fragmentary and in need of test bench model systems. To this aim, we developed a variant of SELEX in-vitro selection to study the evolution of a population of ∼ 10^15 single-strand DNA oligonucleotide ‘individuals’. We begin with a seed of random sequences which we select via affinity capture from ∼ 10^12 DNA oligomers of fixed sequence (‘resources’) over which they compete. At each cycle (‘generation’), the ecosystem is replenished via PCR amplification of survivors. Massive parallel sequencing indicates that across generations the variety of sequences (‘species’) drastically decreases, while some of them become populous and dominate the ecosystem. The simplicity of our approach, in which survival is granted by hybridization, enables a quantitative investigation of fitness through a statistical analysis of binding energies. We find that the strength of individual-resource binding dominates the selection in the first generations, while inter and intra-individual interactions becomes important in later stages, in parallel with the emergence of prototypical forms of mutualism and parasitism.
Please see the paper. All the methods are clearly explained there. Our experimental design takes advantage of a selective capture mechanism where magnetic beads carrying single-stranded DNA filaments of fixed length and sequence target DNA individuals present in a DNA library based on their level of complementarity. Sequences are selected, amplified via PCR, sequenced, and analysed with the home-made codes present also in this repository.
FOS: Biological sciences, DNA, DNA sequencing, Evolutionary ecology
FOS: Biological sciences, DNA, DNA sequencing, Evolutionary ecology
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
