# Software for optimizing treatment to slow the spatial propagation of invasive species: code and results ## ## Description: The complete code and simulation results for finding the optimal treatment of a population front, to slow its propagation to a speed v. The algorithm is described in the paper "Optimizing strategies for slowing the spread of invasive species" by Adam Lampert (PLOS Computational Biology, DOI: 10.1371/journal.pcbi.1011996). Some of the results are demonstrated in Figs. 2-5 in that paper. ## Authorship: The code was written by Adam Lampert, Institute of Environmental Sciences, Robert H. Smith Faculty of Agriculture, Food and Environment, the Hebrew University of Jerusalem, Israel. ## Installation: Running the Matlab code requires the installation of Matlab 2021b for Windows (or a similar version of Matlab). ## Running the code – general model: 1\. Extract all files from "general_model_code.zip" into a single folder. 2\. Open "main.m" and "calc_cost.m" using Matlab. 3\. Change the parameter values, run "main.m," and wait until Matlab completes the execution. ## Running the code – spongy moth model: 1\. Extract all files from "spongy_moth_model_code.zip" into a single folder. 2\. Open "main_F2.m" using Matlab. 3\. Change the parameter values run the code, and wait until Matlab completes the execution. ## Description of the data files: The results for the general model's simulations are given as raw data in the folder "general_model_simulation_results.zip". The data files can be accessed with Matlab. Some of these results are demonstrated in the main article, Fig. 4. The results for the spongy moth model simulations are given as raw data in the folder "spongy_moth_model_simulation_results.zip". The data files can be accessed with Matlab. Some of these results are demonstrated in the main article, Fig. 5. Each data file in "general_model_simulation_results.zip" and in "spongy_moth_model_simulation_results.zip" includes the simulation results for a given set of parameters. The name of the file specifies the parameter values used. Specifically, for the general model, the file name indicates the values of α and v used for the simulation. For the spongy moth model, the file name indicates first the value of (kλ₀) and then the values of v used for the simulation. Each data file includes the following variables: * *n_front:* an array that includes the value of the population front (n-opt) as a function of the location (x). * *treatment:* an array that includes the value of the optimal treatment (A-opt) as a function of the location (x). * *Nx:* size of the n_front and the treatment arrays. The general model data files also cinclude the following parameter value: * *Dt:* time resolution (Δt) Spongy moth model data files also include the following parameter values: * *num_moves:* number of spatial steps the front moves per time unit (equivalent to v). * *delta:* the spatial resolution (σ). * *lambda:* the parameter λ₀. Slowing the spread of invasive species is a major challenge. How can we achieve this goal in the most cost-effective manner? This package includes the complete code and simulation results that help finding the optimal, most cost-effective treatment to slow the spread of a propagating species. This package accompanies the paper "Optimizing strategies for slowing the spread of invasive species" by Adam Lampert (PLOS Computational Biology, DOI: 10.1371/journal.pcbi.1011996). The file general_model_code.zip contains the code for the general model; the file spongy_moth_model_code.zip contains the code for the spongy moth model; and the file general_model_simulation_results.zip contains the results for the general model; and the file spongy_moth_model_simulation_results.zip contains the results for the spongy moth model. The code for the simulations was written in Matlab and the simulation results were obtained by running the code. Opening the code and results requires an installation of Matlab (2021b for Windows or a similar version).
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5061/dryad.dfn2z356h&type=result"></script>');
-->
</script>
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5061/dryad.dfn2z356h&type=result"></script>');
-->
</script>
Rare-earth monopnictides are a family of materials simultaneously displaying complex magnetism, strong electronic correlation, and topological band structure. The recently discovered emergent arc-like surface states in these materials have been attributed to the multi-wave-vector antiferromagnetic order, yet the direct experimental evidence has been elusive. Here we report the observation of non-collinear antiferromagnetic order with multiple modulations using spin-polarized scanning tunneling microscopy. Moreover, we discover a hidden spin-rotation transition of single-to-multiple modulations 2 K below the Neel temperature. The hidden transition coincides with the onset of the surface state splitting observed by our angle-resolved photoemission spectroscopy measurements. Single modulation gives rise to a band inversion with induced topological surface states in a local momentum region while the full Brillouin zone carries trivial topological indices, and multiple modulation further splits the surface bands via non-collinear spin tilting, as revealed by our calculations. The direct evidence of the non-collinear spin order in NdSb not only clarifies the mechanism of the emergent topological surface states but also opens up a new paradigm of control and manipulation of band topology with magnetism. # Data for: Hidden non-collinear spin-order induced topological surface states ## Description of the data and file structure There are three files in the dataset: Dataset.zip, filter.ipf, and DriftCorrection.ipf. filter.ipf is the IgorPro procedure for filtering out high-frequency noise of topographic images.\ DriftCorrection.ipf is the IgorPro procedure for drift correction of topographic images by the Lawler-Fujita algorithm. The dataset.zip contains folders arranged by the figures in the article "Hidden non-collinear spin-order induced topological surface states" to be published in Nature Communications. Each folder contains the STM raw data for plotting the STM images in the corresponding figure. Please refer to the article for a more detailed description of the data and methods. The STM data was collected by Omicron LT-STM at 4K. MTRX files were exported to the IgorPro files with the software Vernissage. Data analysis was performed on IgorPro. The figures are plotted with Origin and arranged in Adobe Illustrator. Use Vernissage to open the head file ending with "_0001.mtrx" and inspect the data contained within. The data can be exported as IgorPro files and further analyzed in IgorPro. IPF files are IgorPro procedures for data analysis.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5061/dryad.280gb5mv3&type=result"></script>');
-->
</script>
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5061/dryad.280gb5mv3&type=result"></script>');
-->
</script>
Mesocosm core setup and sampling procedure Samples were obtained during the AQUACOSM VIMS-Ehux mesocosm experiment in Raunefjorden near Bergen, Norway (60°16′11N; 5°13′07E), in May 2018. Seven bags were filled with 11m3 water from the fjord, containing natural plankton communities. Algal blooms were induced by nutrient addition and monitored for 24 days, as previously described23. 10 samples were collected from four bags, as follows: From bag 3, on days 15 and 20 (named B3T15, B3T20 correspondingly). From bag 4, on days 13, 15,19, and 20 (named B4T13, B4T15, B4T19, and B4T20, correspondingly). From bag 6, on day 17 (named B6T17). From bag 7, on days 16, 17, and 18 (named B7T16, B7T17, and B7T18, correspondingly). Samples were initially filtered as follows: 2 liters of water were filtered with a 20 µm mesh and collected in a glass bottle. The cells were then concentrated through gentle gravity filtration on a 3 µm polycarbonate filter (Whatman), mounted on a reusable bottle top filter holder (Thermo Fischer). The biomass on the filter was regularly resuspended by gentle pipetting. For samples B7T16, B7T18, B4T15, B3T15, B6T17, B7T17, and B4T19, the 2 liters of seawater were concentrated down to 100 ml, distributed in two 50 ml tubes, which corresponds to a 200 times concentration. For B4T13, the concentration factor was 140 times. For B4T20 and B3T20, the concentration factor was 100 times. The different concentration factors are explained by filter clogging and various field constraints, including processing time. For all samples except B3T20, the 50 ml tubes were centrifuged for 4 min at 2500g, after which the supernatant was discarded. Pellets corresponding to the same day and same bag were pooled and resuspended in a final volume of 200 µl of chilled PBS. 1800 µl of pre-chilled high-performance liquid chromatography (HPLC) grade 100% methanol was added drop by drop to the concentrated biomass. For B3T20, the concentrated biomass was centrifuged for 4 min at 2500g, resuspended in 100 µl of chilled PBS, to which 900 µl of chilled HPLC grade 100% methanol was added. Then, samples were incubated for 15 minutes on ice and stored at -80°C until further analysis. Library preparation and RNA-seq sequencing using 10X Genomics For analysis by 10X Genomics, tubes were defrosted and gently mixed, and 1.7 ml of the samples were transferred into an Eppendorf Lowbind tube and centrifuged at 4°C for 3 min at 3000g. The PBS/methanol mix was discarded and replaced by 400 µl of PBS. Cell concentration was measured using an iCyt Eclipse flow cytometer (SONY) based on forward scatter. Cell concentration ranged from 1044 cells ml-1 to 9855 cells ml-1. All concentrations were brought to 1000 cells ml-1 to target 7000 cells recovery, according to the 10X Genomics Cell Suspension Volume Calculator Table provided in the user guide. The cellular suspension was loaded onto Next GEM Chip G targeting 7000 cells and then ran on a Chromium Controller instrument to generate GEM emulsion (10x Genomics). Single-cell 3' RNA-seq libraries were generated according to the manufacturer's protocol (10x Genomics Chromium Single Cell 3' Reagent Kit User Guide v3/v3.1 Chemistry) on different occasions: B4T19 and B7T17 in January 2020 and B3T15, B3T20, B4T13, B4T15, B4T20, B6T17, B7T16, and B7T18 in August 2020 with 12 cycles for cDNA amplification and 15 cycles for library amplification. Library concentrations and quality were measured using the Qubit dsDNA High Sensitivity Assay kit (Life Technologies, Carlsbad, CA). Libraries were pooled according to targeted cell number, aiming for a minimum of 20,000 reads per cell. Pooled libraries were sequenced using the NextSeq® 500 High Output kit (75 cycles). Bioinformatic pipeline A step-by-step description of the bioinformatic pipeline from this step onward, including all in-house scripts used, is detailed in the GitHub repository under github.com/vardilab/host-virus-pairing. Detection of infected cells in the single-cell RNA-seq data using a custom viral genes database To detect viral transcripts, a reference was built from a database of highly conserved genes6 from all NCLDV in the Giant Virus Database9, such as family B DNA polymerase, RNA polymerase subunits, and the major capsid protein. The genes were clustered using CD-HIT v. 4.6.6 at 90% nucleotide identity To remove redundancy43. From this database of 34866 genes, a reference was created using the 10X Genomics Cell Ranger mkref command. The Cell Ranger Software Suite (v. 5.0.0) was used to perform barcode processing (demultiplexing) and single-cell unique molecular identifier (UMI) counting on the raw reads from 47391 cells using the count script (default parameters), with the deduplicated NCLDV database as a reference. For downstream analysis, 972 cells that highly expressed multiple NCLDV genes and were considered "highly infected" were selected. These 'highly infected' cells were selected based on the following criteria: (a) cell expresses in total ≥10 viral UMIs22,24, (b) expression of more than one viral gene (>1), (c) expression of at least one gene with a UMI count greater than one (>1). Cell selection was wrapped using an in-house script (choose_cells.py). Identifying the taxonomy of individual cells by sequence homology to ribosomal RNA Raw reads from each cell were pulled by the cell's unique barcode identifier using seqtk v. 1.2. Reads were then trimmed (command: trim_galore --phred33 -j 8 --length 36 -q 5 --stringency 1 --fastqc -e 0.1), and poly-A was removed (command: trim_galore --polyA -j 1 --length 36), using TrimGalore (v. 0.6.5), a Cutadapt wrapper 44. Trimmed reads from each cell were assembled using rnaSPAdes 3.1545 with kmer 21,33. Raw reads pulling, trimming, and assembly was wrapped using an in-house script (assemble_cells.sh). To identify the taxonomy of the cells, assembled contigs from each cell were matched against 18S rRNA sequences from the Protist Ribosomal Reference (PR2)46 and metaPR247. To remove redundancy, the sequences in each database were clustered using CD-HIT v. 4.6.6 at 99% identity43. Contigs were filtered using SortMeRNA v. 4.3.648 with default parameters against the PR2 database and then aligned to the PR2 and metaPR2 databases using Blastn49, at 99% identity, E-value ≤ 10-10 and alignment length of at least 100 bp. Contigs were ranked by their bitscore, and only the best hit was kept for each contig. Each contig was assigned to one of the following taxonomic groups that were prevalent in the sample: the classes Bacillariophyta, Prymnesiophyceae, Chrysophyceae, MAST-3, and Katablepharidaceae, the divisions Pseudofungi, Lobosa (Amoebozoa), Ciliphora (Ciliates), Dinoflagellata and Cercozoa. Contigs that matched other groups were assigned as "other eukaryotes". Contigs that matched more than one of these taxonomic groups were considered non-specific or chimeric and were therefore ignored. This downstream analysis of Blast result was wrapped using an in-house script (Sankey_wrapper_extended.ipynb). To avoid detection of doublets and predators, Cells that transcribe 18S rRNA transcripts homologous to more than one taxonomic group were conservatively omitted. Of the 972 infected cells detected, 418 (43%) were omitted because we could not assemble specific 18s rRNA contigs from them or because their identity was ambiguous. None of the cells that were assigned "other eukaryotes" had contigs with conflicting annotations (contigs matching different classes). Identifying the infecting virus using a homology search against a custom protein database To identify transcripts derived from giant viruses, reads from the detected 972 infected cells were compared to a custom protein database using a translated alignment approach. To ensure that as many giant viruses as possible were represented, a database was constructed by combining RefSeq v. 20750 with all predicted proteins in the Giant Virus Database9. The proteins were then masked with tantan51 (using the -p option) and generated the database with the lastdb command (using parameters -c, -p). To identify the infecting virus, the raw sequencing reads in each of the 972 single-cell transcriptomes were compared to the constructed database using LASTAL v. 95952 (parameters -m 100, -F 15, -u 2) with best matches retained. The same procedure was done for the assembled transcripts from each cell to identify viral transcripts. The results were analyzed at different taxonomic levels, consistent with the Giant Virus Database (for giant viruses) or NCBI taxonomy33(everything else). 754 Cells whose best matching virus was coccolithovirus were omitted from the downstream analysis since EhV-infected cells were already reported to be abundant in the algal bloom25, and our analysis aims to explore other host-virus pairs. Plotting host-virus pairs in a Sankey plot for host cells and their infecting giant viruses Of the 218 cells detected as infected by viruses other than EhV, 71 were selected that could be identified using assembled 18S rRNA transcripts and have at least 10 reads aligned to one of the virus families (Supplementary Data 1). Only links representing at least 10% of the aligned reads in each cell are shown in order to highlight the strong links. The Sankey plot was constructed using Holoviews v. 1.15.4; see sankey_wrapper.ipynb in the GitHub repository. Phylogenetic trees of viral and host marker genes For phylogenetic analysis, 31 cells were chosen based on a strong correlation (≥90% of viral reads matched one virus family) between the host and a virus. To obtain reference 18S rRNA sequences to include in a phylogeny, all transcripts assembled from these cells were compared to the PR2 database46 using BLASTN v. 2.9.0+ (parameters -perc_identity 95, -evalue 10-10, -max_target_seqs 20, -max_hsps 1). Sequences shorter than 1000 bp were removed from the reference, and the remainder of the sequences were de-replicated with cd-hit v. 4.743 (-c 0.99) to prevent the inclusion of excessive nearly identical references. Sequences were aligned with Muscle553 (default parameters), and diagnostic trees were created with FastTree 2.1.1054 for quick visualization of trees and for pruning long branches. The final phylogenetic trees were constructed with IQ-TREE v. 2.1.252 (parameters -m GTR+F+G4 -alrt 1000 -T AUTO --runs 10). To identify major capsid protein sequences in the single-cell transcriptomes, proteins were first predicted using FragGeneScanRs v. 1.1.056 (parameters -t, illumina_10). The resulting protein sequences were compared to MCP proteins in the Giant Virus Database with BLASTP v. 2.12.0+ (parameters -evalue 10-3, -max_target_seqs 20, -max_hsps 1) as well as to a custom MCP HMM that were previously designed6 using hmmsearch in the HMMER3 v. 3.3.2 package57 (E-value ≤ 10-3). The results of these searches were manually inspected, and sequences were subsequently aligned with Muscle 5 (default parameters). Similarly, as with the 18S rRNA sequences, diagnostic trees were first made with FastTree 2.1.10 and pruned long branches before making a final tree with IQ-TREE v. 2.1.2 (parameters m LG+F+G4 -alrt 1000 -T AUTO --runs 10). Cells for which transcripts are present in both viral and host trees were denoted (Supplementary Data 4). All the codes used to produce the trees are wrapped in the folder "marker_gene_trees" in the GitHub repository. Single-cell RNA-seq data alignment to a custom reference A new host-virus reference database was curated from the transcriptome of the infected cells (Fig. 2). Repetitive sequences were removed using BBduk (BBtools 38.90)58. An Additional long repetitive sequence was removed manually. A database of E. huxleyi and EhV genes, which were shown to be abundant in the samples25, was also added to this reference to specifically detect E. huxleyi cells and to avoid a non-specific alignment of reads from these cells to other contigs. For EhV, the predicted CDSs in the EhVM1 were used as a reference59. For the host, an integrated transcriptome reference of E. huxleyi was used as a reference60. Viral transcripts in the database were identified using a homology search against a custom protein database as described above. A reference was created from the database using the Cell Ranger mkref command. Raw reads were aligned to this reference database using 10X Genomics Cell Ranger v. 5.0.0 count analysis. Preprocessing of transcript abundance and dimensionality reduction A total of 28,656 cells from the 10 samples were initially aligned to the reference database. Cells with zero UMIs and cells with the lowest 1% number of UMIs, as compared with the distribution of transcripts per cell in the entire dataset, were removed for downstream analyses. To prevent cases of doublet or multiplet cells, which can be biological (cell digestion) or technical (fused cells), cells with the highest 1% number of UMIs were also removed. The raw UMIs of 28,015 cells were further preprocessed using the Python package scprep v. 1.0.10: Low expressing genes were filtered with filter.filter_rare_genes and min_cells=2. This number was chosen because we did not want to include genes mapped to only one cell, but we also did not want to exclude low-expressed genes, as they might represent gene expression of low-abundant organisms. Expression was normalized by cell library size with normalize.library_size_normalize, and the data was scaled with transform.sqrt. Preprocessing was wrapped in an in-house script; see 00.01.filter_normalize_scale_single_cell_data.py in the GitHub repository. To represent the cells in two dimensions based on their gene expression profiles, dimensionality reduction was performed using scprep v. 1.1.0 package PCA (method='svd', eps=0.1) and Uniform Manifold Approximation and Projection (UMAP) dimensionality reduction was conducted using the UMAP method in the manifold package of the Python library scikit-learn v. 0.24.1 (minimum distance=0.4 spread=2, number of neighbors=7). Dimensionality reduction was wrapped in an in-house script (00.02.dimentionality_reduction_single_cell_data.py). Assigning taxonomy to each detected cell using rRNA homology search To identify the taxonomy of each detected cell, reads from each cell were assembled independently. The taxonomy of the cells was determined by 18S rRNA homology to one of the following groups, which were abundant in the population: the classes Bacillariophyta (diatoms), Prymnesiophyceae, Chrysophyceae, MAST-3 and Katablepharidaceae, the divisions, Ciliphora (Ciliates), Dinoflagellata and Cercozoa. Other taxonomic groups were clustered under "Other eukaryotes". 16,358 cells were identified this way, and 11,657 cells that could not be identified were excluded from the plot for convenience. Cells with 18S rRNA contigs homologous to more than one taxonomic group were also conservatively omitted. As described above, cells expressing at least 10 viral UMIs were considered infected1,2. This section was wrapped in a Jupyter notebook (Coexpression_wrapper_extended.ipynb). Identifying the Leucocryptos host and its virus using homology search To better identify the detected Katablepharidaceae cells and to identify their infecting virus, 26 infected Katablepharidaceae cells from bag #4, day 20, were selected. Reads from these cells were retrieved using the unique molecular identifier and then trimmed using TrimGalore v. 0.6.5, a Cutadapt wrapper44. Trimming was wrapped in an in-house script; see pull_trim_clean.sh in the GitHub repository. Trimmed read files from all these cells were concatenated into one file and assembled altogether using rnaSPAdes v. 3.1545. To identify the specific Katablepharidaceae host, assembled contigs were matched against the PR2 rRNA database using blastn at 90% identity, E-value ≤ 10-10, and alignment length ≥ 100bp. Contigs best matched to an unknown Katablepharidaceae (>99% nucleotide identity), but after removing unidentified genera, these contigs best matched (>95% nucleotide identity) the Katablepharidaceae species Leucocryptos marina. Transcripts that matched classes other than Katablepharidaceae were matched against the entire NCBI database using the NCBI web server61. They, too, mostly matched Katablepharidaceae genes, specifically 28S rRNA or internal transcribed spacer (ITS) sequences (Supplementary Data 3). To identify the specific infecting virus, transcripts were matched against an NCLDV gene marker database6 at 90% identity, E-value ≤ 10-10, and alignment length ≥ 100bp. After finding homology to Leucocryptos and the virus GVMAG-M-3300020187-271, gene expression was calculated using RSEM v.1.3.162 (rsem-calculate-expression -p 10 --bowtie2 --fragment-length-mean 58). The genomic features of the virus were taken from Schulz (2020)1, and the viral genome was plotted using ShinyCircos v. 2.063. Gene expression in the plot is measured in expected counts after log 2 transformation. The relative abundance data in Fig. 4 was obtained from an 18S rRNA amplicon sequencing on a size fraction of 2-20µm in bag #4 during the mesocosm experiment23. Days 19, 22, and 23 were sampled twice; all other days were sampled once. In Fig. 4c, relative abundance is calculated per taxa as a fraction of all amplicon sequencing variants (ASV), excluding metazoans. Fig. 4d shows the fraction of Katablepharidaceae out of all ASVs matching Katablepharidaceae (excluding metazoans). E. huxleyi abundance was measured by flow cytometry based on high side scatter and high chlorophyll signals. These data were obtained from the source data of the same study23. Phylogenetic tree of Katablepharidaceae ASVs and 18S rRNA genes To verify the taxonomy of the ASVs, A phylogenetic tree was constructed of 89 ASVs identified as Katablepharidaceae, selected 18S rRNA sequences of Katablepharidaceae and other species from the PR2 database, and the longest single cell assembled contig from the infected Katablepharidaceae cells. Sequences were aligned with ClustalOmega v. 1.2.4 (default parameters)64. A diagnostic tree was first made with FastTree 2.1.1054 for pruning long branches before making the final tree with IQ-TREE55. All but three ASVs and one PR2 sequence clustered together with the assembled Leucocryptos transcript, verifying the taxonomy of 97% of the ASVs used in the relative abundance analysis (Extended Data Fig. 4). Phylogenetic trees of viral heat-shock proteins and metacaspase To examine the evolutionary history of the heat-shock proteins encoded in GVMAG-M-3300020187-27, phylogenetic trees of these proteins were constructed together with homologs present in eukaryotes, bacteria, archaea, and other giant viruses. For this, a custom database of proteins from reference genomes was compiled from EggNOG v. 5.065 (eukaryotes), bacteria and archaea (the Genome Taxonomy Database v. 95)66, and other giant viruses (the Giant Virus Database9). For bacterial and archaeal genomes in the GTDB, proteins were predicted first with Prodigal v. 2.6.367 using default parameters. Proteins were searched against Pfam models for each protein using hmmsearch with the noise cutoff (--cut_nc) and subsequently aligned sequences with ClustalOmega v. 1.2.3 (default parameters). Phylogenetic trees were constructed using IQ-TREE v. 2.1.255 (parameters m TEST -bb 1000 -T 6 --runs 10) using ultrafast bootstraps and with the best model determined with ModelFinder68. Substation matrixes used for the phylogenetic trees: Bax-1 - VT+F+R7; Metacaspase - VT+R7; HSP90 - LG+F+R10; HPS70 - LG+F+R10. # Single-cell RNA-seq of the rare virosphere reveals the native hosts of giant viruses in the marine environment Supplementary Files used in the project. These are the main intermediate files that can help reproduce the data. For a detail description on how to reproduce these files and using them, Go to the GitHub site: [https://github.com/vardilab/host-virus-pairing](https://github.com/vardilab/host-virus-pairing) ## Description of the data and file structure Fromm_2023_Data_Availability Sequences GVDB.markergenes.90.fna # De-duplicated database of NCLDV marker genes Transcripts_Cells_Ehux-EhV.95.fasta # The host-virus reference, based on the single-cell transcriptomes of infected cells, to which we added genes from EhV and E. huxleyi. Transcripts_Katablepharidacea.fasta # Transcripts assembled from a highly infected subpopulation of Katablepharidacea Blast_results first_cells.transcripts.edit.metaPR2.csv # Blastn results of single-cell transcripts assembled from highly infected cells (~970) detected against metapr2 database. first_cells.transcripts.edit.PR2.csv # Blastn results of single-cell transcripts assembled from highly infected cells (~970) detected against pr2 database. all_cells.transcripts.edit.metaPR2.csv # Blastn results of single-cell transcripts assembled from all cells (~28,000) detected against metapr2 database. all_cells.transcripts.edit.PR2.csv # Blastn results of single-cell transcripts assembled from all cells (~28,000) detected against metapr2 database. cells.filtered.blastx.csv # Blastx results of single-cell assembled transcripts against refseq database (to find viral transcripts). DATA-SPECIFIC INFORMATION FOR: all_cells.transcripts.edit.metaPR2.csv 1. Number of variables: 12 2. Number of rows: 1409286 3. Variable List: * qseqid: query or source (gene) sequence id * sseqid: subject or target (reference genome) sequence id * pident: percentage of identical positions * length: alignment length (sequence overlap) * mismatch: number of mismatches * gapopen: number of gap openings * qstart: start of alignment in query * qend: end of alignment in query * sstart: start of alignment in subject * send: end of alignment in subject * evalue: expect value * bitscore: bit score DATA-SPECIFIC INFORMATION FOR: all_cells.transcripts.edit.PR2.csv 1. Number of variables: 12 2. Number of rows: 1549087 3. Variable List: * qseqid: query or source (gene) sequence id * sseqid: subject or target (reference genome) sequence id * pident: percentage of identical positions * length: alignment length (sequence overlap) * mismatch: number of mismatches * gapopen: number of gap openings * qstart: start of alignment in query * qend: end of alignment in query * sstart: start of alignment in subject * send: end of alignment in subject * evalue: expect value * bitscore: bit score DATA-SPECIFIC INFORMATION FOR: first_cells.transcripts.edit.metaPR2.csv 1. Number of variables: 12 2. Number of rows: 56763 3. Variable List: * qseqid: query or source (gene) sequence id * sseqid: subject or target (reference genome) sequence id * pident: percentage of identical positions * length: alignment length (sequence overlap) * mismatch: number of mismatches * gapopen: number of gap openings * qstart: start of alignment in query * qend: end of alignment in query * sstart: start of alignment in subject * send: end of alignment in subject * evalue: expect value * bitscore: bit score DATA-SPECIFIC INFORMATION FOR: first_cells.transcripts.edit.PR2.csv 1. Number of variables: 12 2. Number of rows: 67864 3. Variable List: * qseqid: query or source (gene) sequence id * sseqid: subject or target (reference genome) sequence id * pident: percentage of identical positions * length: alignment length (sequence overlap) * mismatch: number of mismatches * gapopen: number of gap openings * qstart: start of alignment in query * qend: end of alignment in query * sstart: start of alignment in subject * send: end of alignment in subject * evalue: expect value * bitscore: bit score DATA-SPECIFIC INFORMATION FOR: cells.filtered.blastx.csv 1. Number of variables: 10 2. Number of rows: 2021 3. Variable List: * qseqid: query or source (gene) sequence id * sseqid: subject or target (reference genome) sequence id * pident: percentage of identical positions * bitscore: bit score * domain: domain of life the of subject * phylum: phylum the of subject * family: family the of subject * genus: genus the of subject * species: species the of subject * cell barcode: cell barcode of query UMI_tables_first_cells data_raw.pickle.gz # Data combined from all UMI tables of fastq files mapped to the NCLDV marker gene database UMI_tables_scatterplot data_raw.pickle.gz # Data combined from all UMI tables of fastq files mapped to the host-virus reference data.pickle.gz # Preprocessed ata combined from all UMI tables of fastq files mapped to the host-virus reference metadata_dimentionality_reduction_1_1.2_.pickle.gz # Metadata of preprocessed data, after dimensionality reduction. ## Access information The PR2 database can be accessed from here: [https://github.com/pr2database/pr2database](https://github.com/pr2database/pr2database) Giant viruses (phylum Nucleocytoviricota) are globally distributed in aquatic ecosystems. They play significant roles as evolutionary drivers of eukaryotic plankton and regulators of global biogeochemical cycles. However, we lack knowledge about their native hosts, hindering our understanding of their lifecycle and ecological importance. Here, we used single-cell RNAseq and samples from an induced E. huxleyi bloom during a mesocosm experiment to link giant viruses with their protist hosts. We observe active giant virus infections in multiple host lineages, including members of the algal groups Chrysophycae and Prymnesiophycae, as well as heterotrophic flagellates in the class Katablepharidaceae. Katablepharids were infected with a rare Imitevirales-07 giant virus lineage expressing cell fate regulation genes. Analysis of the temporal dynamics of this host-virus interaction indicated a role for the Imitevirales-07 in the collapse of the host Katablepharid population. Our results demonstrate that single-cell RNA-seq can be used to identify previously undescribed host-virus interactions and study their ecological relevance.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5061/dryad.s7h44j1c9&type=result"></script>');
-->
</script>
citations | 1 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5061/dryad.s7h44j1c9&type=result"></script>');
-->
</script>
The mere presence of predators causes prey organisms to display predation-avoidance strategies. Predator presence is often communicated through predator-released chemical signals. Ovipositing female mosquitoes of several species are repelled by unknown signals released from larvivorous fish. It was previously suggested that in many cases, a predator’s microbiota plays an important role in the release of these signals; however, this mechanism is still poorly understood. In this study, we looked into the effects of the microbiota originating from the larvivorous Gambusia affinis (Baird and Girard) on the oviposition behavior of gravid female mosquitoes. We used fish with altered microbiota and bacterial isolates in a set of outdoor mesocosm experiments to address this aim. We show that interference with the fish microbiota significantly reduces fish’s repellant effect. We further show that the bacterium Pantoea pleuroti, isolated from the skin of the fish, repels oviposition of Culex laticinctus (Edwards) and Culiseta longiareolata (Macquart) mosquitoes similarly to the way in which live fish repel them. Our results highlight the importance of bacteria in the interspecies interactions of their hosts. Furthermore, this finding may lead to the development of an ecologically friendly mosquito repellent, that may reduce the use of larvivorous fish for mosquito control. # Fish microbiota repel ovipositing mosquitoes [https://doi.org/10.5061/dryad.9p8cz8wqf](https://doi.org/10.5061/dryad.9p8cz8wqf) The dataset includes data from 4 field experiments described in Figures 2-5. It includes the distribution of mosquito egg rafts from two species, *Culex laticinctus* and *Culiseta logiareolata*. Egg rafts were collected from pools as described in the method section. ## Description of the data and file structure Data describes the total number of egg rafts collected over all dates in each of the pools. Each pool is a combination of “block” and “treatment” variables. The other columns present the dependent variables, i.e., total number of egg rafts for each mosquito species. Experimental duration is presented at the bottom and consists of beginning and ending dates plus the total day count.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5061/dryad.9p8cz8wqf&type=result"></script>');
-->
</script>
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5061/dryad.9p8cz8wqf&type=result"></script>');
-->
</script>
# Rapid and chemically diverse C transfer from trees to mycorrhizal fruit bodies in the forest [https://doi.org/10.5061/dryad.34tmpg4s5](https://doi.org/10.5061/dryad.34tmpg4s5) ## Description of the data and file structure Ectomycorrhizal fungi (EMF) are common belowground tree symbionts, supplying trees with water and nutrients. In return, large amounts of C assimilated by trees can be allocated into EMF. However, the chemical forms in which the C is transferred from trees to fungi under field conditions are mostly unknown. In this study, we aimed to unravel the fate of tree-derived C in EMF. We conducted 13CO2 pulse labeling of *Pinus halepensis* trees in two forest sites with adjacent EMF sporocarps, combined with a non-targeted metabolomics profiling of root and sporocarp tissues. 13C was measured in sporocarps of *Tricholoma terreum* and *Suillus collinitus* up to 3 m from pine stems. Here we provide a table with soil properties (pH, salinity, and mineral composition) under the study trees at the two forest sites, and three tables showing P-values of isotopes of labeled semi-polar metabolites identified in *Pinus* roots, *Suillus* fruit bodies, and *Tricholoma* fruit bodies samples comparing before and after 13C labeling. Below is a description for each of the data tables. #### **Hosted via Dryad** **Rapaport_et_al_Soil_properties.xlsx** Soil properties (pH, salinity, and mineral composition) under the study trees at the two forest sites. Measurements were done on soil at 0-10 cm and 10-20 cm depths (excluding trees 24 and 29 in Charuvit forest, where soil was too shallow). EC, electric conductivity (dS m-1); SOC, soil organic carbon (%); All mineral values (Cl, Ca, Mg, N-NO3, N-NH4, Olsen P) are at mg kg-1. #### **Hosted via Zenodo** **Table S2** Soil properties (pH, salinity, and mineral composition) under the study trees at the two forest sites. Measurements were done on soil at 0-10 cm and 10-20 cm depths (excluding trees 24 and 29 in Charuvit forest, where soil was too shallow). EC, electric conductivity (dS m-1); SOC, soil organic carbon (%); mineral values are at mg kg-1. **Table S6** P-values of isotopes of labeled semi-polar metabolites of roots samples comparing before and after 13C labeling. The isotope shown is the most significant one from each metabolite. FDR correction was performed. TCA, tricarboxylic acid cycle cycle. **Table S7** P-values of isotopes of labeled semi-polar metabolites of samples from *Suillus* comparing before and after 13C labeling. The isotope shown is the most significant one from each metabolite. FDR correction was performed. TCA, tricarboxylic acid cycle cycle. **Table S8** P-values of isotopes of labeled semi-polar metabolites in *Tricholoma* samples comparing before and after 13C labeling. The isotope shown is the most significant one from each metabolite. FDR correction was performed. TCA, tricarboxylic acid cycle cycle. ## Sharing/Access information The data belong to an accepted paper in Functional Ecology. A DOI link will be added. Ectomycorrhizal fungi (EMF) are common belowground tree symbionts, supplying trees with water and nutrients. In return, large amounts of C assimilated by trees can be allocated into EMF. However, the chemical forms in which the C is transferred from trees to fungi under field conditions are mostly unknown. In this study, we aimed to unravel the fate of tree-derived C in EMF. We conducted 13CO2 pulse labeling of Pinus halepensis trees in two forest sites with adjacent EMF sporocarps, combined with a non-targeted metabolomics profiling of root and sporocarp tissues. 13C was measured in sporocarps of Tricholoma terreum and Suillus collinitus up to 3 m from pine stems. C was assimilated in the labeled trees’ needles and transferred to their roots. Starting from day 2 after labeling, the C was transferred to adjacent sporocarps, peaking on day 5. We identified more than 100 different labeled metabolites of different chemical groups present in roots and sporocarps. Of them, 17 were common to pine roots and both EMF species, and an additional 8 were common to roots and one of the two EMFs. The major labeled metabolites in the root tips were amino acids and tricarboxylic acid intermediates. The major labeled metabolites in sporocarps were amino acids, nucleotides, and fatty acids. We also identified labeled carbohydrates in all tissues. Labeling patterns diverged across different tissues, which can hint at how the C was transferred. Considering the young tree as a sole C source for these sporocarps, and with a diurnal assimilation of 5.4 g C, the total monthly C source is ~165 g C. On average, there were 10 sporocarps around each tree, each requiring ~1 g C. Therefore, a 10 g C investment would make 6% of total tree C allocation, and about 12% of net primary productivity. Overall, we found that this significant and ubiquitous transfer of metabolites from tree roots to EMF sporocarps is more rapid and chemically diverse than once thought.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5061/dryad.34tmpg4s5&type=result"></script>');
-->
</script>
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5061/dryad.34tmpg4s5&type=result"></script>');
-->
</script>
Materials The bile acids ursodeoxycholic acid (UDCA), hyodeoxycholic acid (HDCA), taurodeoxycholic acid (TDCA), taurocholic acid (TCA), cholic acid (CA), taurochenodeoxycholic acid (TCDCA), glycocholic acid (GCA), β-muricholic acid (BMA), tauro-β-muricholic acid (TBMCA), α-muricholic acid (AMA), deoxycholic acid (DCA), glycoursodeoxycholic acid (GUDCA), glycohyodeoxycholic acid (GHDCA), glycochenodeoxycholic acid (GCDCA) and glycine-β-muricholic acid (GBMCA) as well as an internal standard stock solution containing a mixture of CA-d4, GCA-d4 and DCA-d4 were obtained from Cayman Chemical (Ann Arbor, Michigan, USA). Stock solutions of 1 mg/mL in 100% methanol were diluted to obtain calibration curves for concentrations ranging from 15.6 to 1000 nM. Percoll was obtained from GE Healthcare Life Sciences (Chicago, Illinois, USA). Antibodies against CD4, Ly6G and CD90.2 were obtained from BD Pharmingen (San Diego, CA, USA) and antibodies against CD8α, NK1.1, B220 and Ly6C from BioLegend (San Diego, CA, USA). Goat anti-collagen I antibody for staining was obtained from Southern Biotech (Birmingham, AL, USA). Rabbit anti-Cytokeratin 19 (KRT 19) antibody was obtained from Abcam (Boston, MA, USA). Biotinylated Goat Anti-Rabbit IgG Antibody was obtained from Vector Laboratories, Inc. (Burlington, CA, USA). Biotinylated HA-binding protein for staining HA was obtained from EMD Millipore (Burlington, MA, USA). LIVE/DEAD fixable viability dye was from Life Technologies (Carlsbad, CA, USA). Biliatresone was synthesized as previously described [31]. Stock solutions were made in DMSO and diluted in 1X PBS for gavage. Animal experiments We used BALB/c mice obtained from Jackson Laboratories as animal model. All animal experiments were conducted following the National Institutes of Health policy, and the study was approved by the Institutional Animal Care and Use Committee at the University of Pennsylvania under protocol #804862, ensuring that all procedures were performed ethically and with minimal harm to the animals. To investigate the effects of biliatresone, pregnant female mice were administered either 15 mg/kg of biliatresone or vehicle (containing an equivalent concentration of DMSO, at 0.3 ml/kg) via gavage on days 14 and 15 post mating. All gavaging procedures were carried out with utmost consideration to ensure humane and ethical treatment throughout the experiment. Animals were closely monitored post-gavaging for any signs of distress, and no instances of stressed animals were recorded during the observation period. We used a dosage of 15 mg/kg of biliatresone in order to avoid pregnancy loss, which was reported by Yang et al. [1] with higher doses. The average litter size in the biliatresone-treated group was 6, comparable to the control group. We chose E14 and 15 for biliatresone treatment in alignment with the time of hepatoblast differentiation and early biliary network formation [2]. Half of the pups from each litter (chosen randomly) were euthanized at P5, and the other half were euthanized at P21. All animals were handled with humane care. Carbon dioxide was utilized as the primary method for euthanasia. Additionally, P5 pups underwent a secondary physical method of euthanasia through decapitation, while other animals underwent cervical dislocation as the secondary method..We did not determine sex given that anogenital distance measurements at P5 lack accuracy, but our random selection of pups for euthanasia at a specific day typically yields a balanced male-female distribution. Mother mice and P21 pups were euthanized on the same day, and blood, EHBDs, and liver samples were collected from all the animals. Placental tissue was not collected. Although in utero exposure is the most likely route of biliatresone toxicity in pups, we could not rule out transfer in milk [3] and therefore kept nursing mothers with pups until P21, then euthanized both. Serum samples were analyzed for albumin, alkaline phosphatase (ALP) and aspartate transaminase (AST) levels, as well as bile acid concentrations. Liver samples were analyzed for bile acids and immune cells. EHBD and liver samples were fixed and stained for further analysis. There were 13-15 total pups in each group; however, due to limited serum sample volumes, especially in P5 pups, not all analyses could be performed on all samples. Histochemistry and immunostaining After collection, EHBDs and liver samples were fixed in 10% formalin and embedded in paraffin. The embedded samples were then sectioned at 5 µm thickness, and slides were stained for Hematoxylin and Eosin (H&E). Standard protocols were followed for processing the slides. For antibody staining, EHBD sections were deparaffinized with xylene and rehydrated through a graded series of alcohols and distilled water. Antigen retrieval was performed in 10 mM citric acid buffer (pH 6.0). Sections were blocked with 5% bovine serum albumin and permeabilized with 0.4% Triton X-100 prior to antibody incubation. Sections were stained for collagen I, HA and DAPI as described in [4]. For collagen cy3 anti-goat antibodies and for HA-binding protein, Cy2-streptavidin secondary antibodies were used (1:500, Vector Laboratories). Liver sections were stained for KRT 19 and labelled with diaminobenzidine. Sections were incubated with 3% H2O2 to quench endogenous peroxidases and blocked with StartingBlock™ T20/PBS Blocking Buffer (Thermo Fisher Scientific, Waltham, MA, USA) and Avidin D and Biotin Blocking Reagents, prior to incubation with primary KRT 19 antibodies (1:500) overnight at 4oC. The next day, sections were incubated with secondary antibodies (1:500) for 30 minutes at 37oC and visualized using an Avidin-Biotin Complex detection system (Vector Elite Kit, Vector Laboratories, Burlingame, CA, USA). Signals were developed by a diaminobenzidine substrate kit for peroxidases (Vector Laboratories) and counterstained with hematoxylin. Histology assessment For bile duct H&E-stained slides, a qualitative assessment of damage was performed by grading the slides as normal or abnormal based on several features. These features included the presence of luminal debris, marked inflammation, detachment of surface epithelium, and signs of regeneration (including multi-layered surface epithelium and peribiliary gland expansion). Similarly, liver H&E-stained slides were graded as normal or abnormal based on the presence of bile duct damage, ductular reaction and the presence of bile plugs. The grading was performed independently by two researchers, IDJ and NDT, with more than eight animals per group being analyzed. Image analysis Image analysis of stained sections was performed with Fiji ImageJ and QuPath v0.2.0 software. The QuPath selection tool was used to calculate the area of the biliary submucosa that was occupied by HA (based on HA-binding protein staining) relative to the entire submucosal area. As a second measure for the thickness of the HA layer, the width between the lumen and the HA-collagen interface was measured in at least 5 different places, and was adjusted relative to the entire thickness of the bile duct wall. For KRT 19 stained samples, the QuPath pixel classification tool was utilized to measure the KRT 19 positive area relative to the field area. The number of KRT 19-positive foci per portal triad was counted manually. Liver immunology Intrahepatic leukocytes were isolated by Percoll density gradient centrifugation and stained with LIVE/DEAD fixable viability dye, or with antibodies against CD4, CD8α, NK1.1, B220, Ly6C, Ly6G, and CD90.2. All samples were separated on a MACSQuant flow cytometer (Miltenyi Biotec, Gaithersburg, MD, USA) and analyzed using FlowJo software version 10.6 (Tree Star) (Fig 4A). Sample processing for HPLC Bile acids were extracted from homogenized liver and serum samples as described [5,6]. Separations were performed on a Waters BEH C18 Column (2.1 mm x 50 mm 1.7 μm). Mobile phase A was water with 0.1% formic acid, and mobile phase B was methanol with 0.1% formic acid at 0.4 mL/min flow. The gradient started at 5% B and was changed to 40% B over 2 min, then to 99% B over 2 min, held constant for 3 minutes then back to the initial composition for equilibration of the column, for a total chromatographic separation time of 12 min. Analysis was conducted on a Thermo Q Exactive HF coupled to an Ultimate 3000 UHPLC interfaced with a heated electrospray ionization (HESI-II) source. The instrument was operated in negative ion mode alternating between full scan from 250-800 m/z at a resolution of 120,000 and parallel reaction monitoring at 60,000 resolution with a precursor isolation window of 0.7 m/z. Since sample amounts were limited, not all analyses were performed on all samples. Bile acid values were normalized to average values obtained for control pups in each set of experiments. Some bile acids could not be detected in all samples; for purposes of the analysis, only those bile acids detected in at least 5 samples were considered. Statistical analysis Statistical significance was calculated by one and two-tailed Student’s t-tests. Differences in variance were tested using the F test [7]. The number of samples tested for each experiment is given in parentheses in the graphs. All data are shown as boxplots. References 1. Yang Y, Wang J, Zhan Y, Chen G, Shen Z, Zheng S, et al. The synthetic toxin biliatresone causes biliary atresia in mice. Lab Investig. 2020;100: 1425–1435. doi:10.1038/s41374-020-0467-7 2. Su X, Shi Y, Zou X, Lu ZN, Xie G, Yang JYH, et al. Single-cell RNA-Seq analysis reveals dynamic trajectories during mouse liver development. BMC Genomics. 2017;18: 946. doi:10.1186/s12864-017-4342-x 3. Kotb MA, Kotb A, Talaat S, Shehata SM, El Dessouki N, Elhaddad AA, et al. Congenital aflatoxicosis, mal-detoxification genomics & ontogeny trigger immune-mediated Kotb disease biliary atresia variant: SANRA compliant review. Med (United States). 2022;101: E30368. doi:10.1097/MD.0000000000030368 4.de Jong IEM, Hunt ML, Chen D, Du Y, Llewellyn J, Gupta K, et al. A fetal wound healing program after intrauterine bile duct injury may contribute to biliary atresia. J Hepatol. 2023;79: 1396–1407. doi:10.1016/j.jhep.2023.08.010 5. Huang J, Bathena SPR, Csanaky IL, Alnouti Y. Simultaneous characterization of bile acids and their sulfate metabolites in mouse liver, plasma, bile, and urine using LC-MS/MS. J Pharm Biomed Anal. 2011;55: 1111–1119. doi:10.1016/j.jpba.2011.03.035 6. Ghaffarzadegan T, Essén S, Verbrugghe P, Marungruang N, Hållenius FF, Nyman M, et al. Determination of free and conjugated bile acids in serum of Apoe(−/−) mice fed different lingonberry fractions by UHPLC-MS. Sci Rep. 2019;9: 3800. doi:10.1038/s41598-019-40272-8 7. Mohr DL, Wilson WJ, Freund RJ. Statistical Methods. Statistical Methods. New Delhi: Wiley Blackwell; 2021. doi:10.1016/B978-0-12-823043-5.00015-1 # Low-dose biliatresone treatment of pregnant mice causes subclinical biliary disease in their offspring: evidence for a spectrum of neonatal injury This dataset comprises data collected from pups born to mothers administered either biliatresone or a vehicle control via gavage. Analysis was conducted on pups at postnatal days 5 and 21, with mothers analyzed alongside pups at postnatal day 5. Physical measurements, liver, bile duct, and serum samples were collected. Serum samples were utilized for assessing serum biochemistry relevant to liver diseases and bile acid content. Liver samples underwent analysis for cytokeratin-19 (KRT19) staining, immune profiling, and bile acid measurements. Bile duct samples were subjected to hyaluronic acid staining. ## Description of the data and file structure The data are structured in accordance with the organization of figures in our manuscript, with two comparison groups designated: "D" representing the vehicle control group and "B" representing the biliatresone-treated group. Each cell value within the groups represents data obtained from individual pups. High-performance liquid chromatography (HPLC) measurements are presented as normalized area ratios for specific bile acids. The first cell in each sheet provides a brief background on the experiment and the specific measurement presented in the sheet. ## Sheet 1 Sheet title: Fig 1B.P5 serum biochemistry Description: Pregnant mothers (E14 and E15) were treated with 15 mg/kg of biliatresone or vehicle control (DMSO), and P5 pups were analyzed. Serum biochemistry and physical parameters for P5 pups. The number of pups is shown in parentheses. D denotes DMSO treated, B denotes Biliatresone treated. Each cell denote value from individual pup. ALP (U/L), AST (U/L), Albumin (g/dL), and Weight (gm) were analyzed and are reported. ## Sheet 2 Sheet title: Fig 1D.P5 KRT19 quantification Description: Pregnant mothers (E14 and E15) were treated with 15 mg/kg of biliatresone or vehicle control (DMSO), and P5 pups were analyzed. Quantification of the number of cytokeratin 19 (KRT19) positive foci per portal triad and KRT19 positive area per field in P5 pups, with the number of pups indicated in parentheses. D denotes DMSO treated, B denotes Biliatresone treated. Each cell denote value from individual pup. KRT 19 +ve Area per field (%) and KRT 19 +ve duct/PT were analyzed and are reported. PT: Portal traid. ## Sheet 3 Sheet title: Fig 1F.P5 hyaluronic acid Description: Pregnant mothers (E14 and E15) were treated with 15 mg/kg of biliatresone or vehicle control (DMSO), and P5 pups were analyzed. Quantification of submucosal area stained by hyaluronic acid (HA)-binding protein. The number of P5 pups is shown in parentheses. D denotes DMSO treated, B denotes Biliatresone treated. Each cell denote value from individual pup.HA content (%) is reported in this sheet. ## Sheet 4 Sheet title: Fig 2B.P21 serum biochemistry Description: Pregnant mothers (E14 and E15) were treated with 15 mg/kg of biliatresone or vehicle control (DMSO), and P21 pups were analyzed. Serum biochemistry and physical parameters for P21 pups. The number of pups is shown in parentheses. D denotes DMSO treated, B denotes Biliatresone treated. Each cell denote value from individual pup. ALP (U/L), AST (U/L), Albumin (g/dL), and Weight (gm) were analyzed and are reported. ## Sheet 5 Sheet title: Fig 2D.P21 KRT19 quantification Description: Pregnant mothers (E14 and E15) were treated with 15 mg/kg of biliatresone or vehicle control (DMSO), and P21 pups were analyzed. Quantification of the number of cytokeratin 19 (KRT19) positive foci per portal triad and KRT19 positive area per field in P21 pups, with the number of pups indicated in parentheses. D denotes DMSO treated, B denotes Biliatresone treated. Each cell denote value from individual pup. KRT 19 +ve Area per field (%) and KRT 19 +ve duct/PT were analyzed and are reported. PT: Portal traid. ## Sheet 6 Sheet title: Fig 3A.P5 liver bile acid Pregnant mothers (E14 and E15) were treated with 15 mg/kg of biliatresone or vehicle control (DMSO). Liver from P5 pups were analyzed. Relative amounts of various bile acids from P5 liver are reported here. Individual bile acids were normalized to the corresponding mean from control pups from each experiment. Each column denote value from individual pup. D denotes DMSO treated. B denotes Biliatresone treated. Empty cells represent the values that could not be determined. Each row denotes individual bile acid that were determined. The abbreviations for bile acid could be found in method section. ## Sheet 7 Sheet title: Fig 3B.P5 serum bile acid Pregnant mothers (E14 and E15) were treated with 15 mg/kg of biliatresone or vehicle control (DMSO). Serum from P5 pups were analyzed. Relative amounts of various bile acids from P5 Serum are reported here. Individual bile acids were normalized to the corresponding mean from control pups from each experiment. Each Column denote value from individual pup. D denotes DMSO treated. B denotes Biliatresone treated. Empty cells represent the values that could not be determined. Each row denotes individual bile acid that were determined. The abbreviations for bile acid could be found in method section. ## Sheet 8 Sheet title: Fig 3C.P21 liver bile acid Pregnant mothers (E14 and E15) were treated with 15 mg/kg of biliatresone or vehicle control (DMSO). Liver from P21 pups were analyzed. Relative amounts of various bile acids from P21 liver are reported here. Individual bile acids were normalized to the corresponding mean from control pups from each experiment. Each Column denote value from individual pup. D denotes DMSO treated. B denotes Biliatresone treated. Empty cells represent the values that could not be determined. Each row denotes individual bile acid that were determined. The abbreviations for bile acid could be found in method section. ## Sheet 9 Sheet title: Fig 3D.P21 serum bile acid Pregnant mothers (E14 and E15) were treated with 15 mg/kg of biliatresone or vehicle control (DMSO). Serum from P21 pups were analyzed. Relative amounts of various bile acids from P21 Serum are reported here. Individual bile acids were normalized to the corresponding mean from control pups from each experiment. Each Column denote value from individual pup. D denotes DMSO treated. B denotes Biliatresone treated. Empty cells represent the values that could not be determined. Each row denotes individual bile acid that were determined. The abbreviations for bile acid could be found in method section. ## Sheet 10 Sheet title: Fig 4B.P21 Immune cells Pregnant mothers (E14 and E15) were treated with 15 mg/kg of biliatresone or vehicle control (DMSO). Liver from P21 pups were analyzed. Quantification showing numbers of T-cells, CD4 cells, CD8 cells, B-cells, monocytes and neutrophils in livers isolated from P21 pups. The number of pups is shown in parentheses. Each cell denote value from individual pup. D denotes DMSO treated. B denotes Biliatresone treated. ## Sheet 11 Sheet title: Fig 5BMother serum biochemistry Pregnant mothers (E14 and E15) were treated with 15 mg/kg of biliatresone or vehicle control (DMSO). Mothers were euthanized along with P21 pups, and serum biochemistry and physical parameters were measured for both control and biliatresone-treated mothers.Each cell denote value from individual pup. D denotes DMSO treated. B denotes Biliatresone treated. ## Sheet 12 Sheet title: Fig Fig 5C.Mother immune cells Pregnant mothers (E14 and E15) were treated with 15 mg/kg of biliatresone or vehicle control (DMSO). Mothers were euthanized along with P21 pups, and the numbers of immune cells in livers isolated from both control and biliatresone-treated mothers were quantified. Quantification showing numbers of T-cells, CD4 cells, CD8 cells, B-cells, monocytes and neutrophils in livers isolated from mothers. Biliary atresia is a neonatal disease characterized by damage, inflammation, and fibrosis of the liver and bile ducts and by abnormal bile metabolism. It likely results from a prenatal environmental exposure that spares the mother and affects the fetus. Our aim was to develop a model of fetal injury by exposing pregnant mice to low-dose biliatresone, a plant toxin implicated in biliary atresia in livestock, and then to determine whether there was a hepatobiliary phenotype in their pups. Pregnant mice were treated orally with 15 mg/kg/d biliatresone for 2 days. Histology of the liver and bile ducts, serum bile acids, and liver immune cells of pups from treated mothers were analyzed at P5 and P21. Pups had no evidence of histological liver or bile duct injury or fibrosis at either timepoint. In addition, growth was normal. However, serum levels of glycocholic acid were elevated at P5, suggesting altered bile metabolism, and the serum bile acid profile became increasingly abnormal through P21, with enhanced glycine conjugation of bile acids. There was also immune cell activation observed in the liver at P21. These results suggest that prenatal exposure to low doses of an environmental toxin can cause subclinical disease including liver inflammation and aberrant bile metabolism even in the absence of histological changes. This finding suggests a wide potential spectrum of disease after fetal biliary injury.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5061/dryad.m63xsj48x&type=result"></script>');
-->
</script>
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5061/dryad.m63xsj48x&type=result"></script>');
-->
</script>
# Integrin restriction by miR-34 protects germline progenitors from cell death during aging The dataset contains 16 samples. Each Sample contains 2 fastq.gz files from 2 lanes. There are 4 experimental groups: * w1118 1 day * w1118 30 day * mir34 knockout 1 day * mir34 knockout 30 day Each group contains 4 biological replicates. ## Description of the data and file structure w1118 1 day Rep1: w1118_1day_Rep1_L6.fastq.gz * RNA-Sequencing data from Drosophila testes, Genotype w1118, Age 1 day, Replicate #1, Lane 6 w1118_1day_Rep1_L7.fastq.gz * RNA-Sequencing data from Drosophila testes, Genotype w1118, Age 1 day, Replicate #1, Lane 7 w1118 1 day Rep2: w1118_1day_Rep2_L6.fastq.gz * RNA-Sequencing data from Drosophila testes, Genotype w1118, Age 1 day, Replicate #2, Lane 6 w1118_1day_Rep2_L7.fastq.gz * RNA-Sequencing data from Drosophila testes, Genotype w1118, Age 1 day, Replicate #2, Lane 7 w1118 1 day Rep3: w1118_1day_Rep3_L6.fastq.gz * RNA-Sequencing data from Drosophila testes, Genotype w1118, Age 1 day, Replicate #3, Lane 6 w1118_1day_Rep3_L7.fastq.gz * RNA-Sequencing data from Drosophila testes, Genotype w1118, Age 1 day, Replicate #3, Lane 7 w1118 1 day Rep4: w1118_1day_Rep4_L6.fastq.gz * RNA-Sequencing data from Drosophila testes, Genotype w1118, Age 1 day, Replicate #4, Lane 6 w1118_1day_Rep4_L7.fastq.gz * RNA-Sequencing data from Drosophila testes, Genotype w1118, Age 1 day, Replicate #4, Lane 7 w1118 30 days Rep1: w1118_30day_Rep1_L6.fastq.gz * RNA-Sequencing data from Drosophila testes, Genotype w1118, Age 30 days, Replicate #1, Lane 6 w1118_30day_Rep1_L7.fastq.gz * RNA-Sequencing data from Drosophila testes, Genotype w1118, Age 30 days, Replicate #1, Lane 7 w1118 30 days Rep2: w1118_30day_Rep2_L6.fastq.gz * RNA-Sequencing data from Drosophila testes, Genotype w1118, Age 30 days, Replicate #2, Lane 6 w1118_30day_Rep2_L7.fastq.gz * RNA-Sequencing data from Drosophila testes, Genotype w1118, Age 30 days, Replicate #2, Lane 7 w1118 30 days Rep3: w1118_30day_Rep3_L6.fastq.gz * RNA-Sequencing data from Drosophila testes, Genotype w1118, Age 30 days, Replicate #3, Lane 6 w1118_30day_Rep3_L7.fastq.gz * RNA-Sequencing data from Drosophila testes, Genotype w1118, Age 30 days, Replicate #3, Lane 7 w1118 30 days Rep4: w1118_30day_Rep4_L6.fastq.gz * RNA-Sequencing data from Drosophila testes, Genotype w1118, Age 30 days, Replicate #4, Lane 6 w1118_30day_Rep4_L7.fastq.gz * RNA-Sequencing data from Drosophila testes, Genotype w1118, Age 30 days, Replicate #4, Lane 7 mir34 KO 1 day Rep1: mir34_KO_1day_Rep1_L6.fastq.gz * RNA-Sequencing data from Drosophila testes, Genotype mir34 knockout, Age 1 day, Replicate #1, Lane 6 mir34_KO_1day_Rep1_L7.fastq.gz * RNA-Sequencing data from Drosophila testes, Genotype mir34 knockout, Age 1 day, Replicate #1, Lane 7 mir34 KO 1 day Rep2: mir34_KO_1day_Rep2_L6.fastq.gz * RNA-Sequencing data from Drosophila testes, Genotype mir34 knockout, Age 1 day, Replicate #2, Lane 6 mir34_KO_1day_Rep2_L7.fastq.gz * RNA-Sequencing data from Drosophila testes, Genotype mir34 knockout, Age 1 day, Replicate #2, Lane 7 mir34 KO 1 day Rep3: mir34_KO_1day_Rep3_L6.fastq.gz * RNA-Sequencing data from Drosophila testes, Genotype mir34 knockout, Age 1 day, Replicate #3, Lane 6 mir34_KO_1day_Rep3_L7.fastq.gz * RNA-Sequencing data from Drosophila testes, Genotype mir34 knockout, Age 1 day, Replicate #3, Lane 7 mir34 KO 1 day Rep4: mir34_KO_1day_Rep4_L6.fastq.gz * RNA-Sequencing data from Drosophila testes, Genotype mir34 knockout, Age 1 day, Replicate #4, Lane 6 mir34_KO_1day_Rep4_L7.fastq.gz * RNA-Sequencing data from Drosophila testes, Genotype mir34 knockout, Age 1 day, Replicate #4, Lane 7 mir34 KO 30 days Rep1: mir34_KO_30day_Rep1_L6.fastq.gz * RNA-Sequencing data from Drosophila testes, Genotype mir34 knockout, Age 30 days, Replicate #1, Lane 6 mir34_KO_30day_Rep1_L7.fastq.gz * RNA-Sequencing data from Drosophila testes, Genotype mir34 knockout, Age 30 days, Replicate #1, Lane 7 mir34 KO 30 days Rep2: mir34_KO_30day_Rep2_L6.fastq.gz * RNA-Sequencing data from Drosophila testes, Genotype mir34 knockout, Age 30 days, Replicate #2, Lane 6 mir34_KO_30day_Rep2_L7.fastq.gz * RNA-Sequencing data from Drosophila testes, Genotype mir34 knockout, Age 30 days, Replicate #2, Lane 7 mir34 KO 30 days Rep3: mir34_KO_30day_Rep3_L6.fastq.gz * RNA-Sequencing data from Drosophila testes, Genotype mir34 knockout, Age 30 days, Replicate #3, Lane 6 mir34_KO_30day_Rep3_L7.fastq.gz * RNA-Sequencing data from Drosophila testes, Genotype mir34 knockout, Age 30 days, Replicate #3, Lane 7 mir34 KO 30 days Rep4: mir34_KO_30day_Rep4_L6.fastq.gz * RNA-Sequencing data from Drosophila testes, Genotype mir34 knockout, Age 30 days, Replicate #4, Lane 6 mir34_KO_30day_Rep4_L7.fastq.gz * RNA-Sequencing data from Drosophila testes, Genotype mir34 knockout, Age 30 days, Replicate #4, Lane 7 During aging, regenerative tissues must dynamically balance the two opposing processes of proliferation and cell death. While many microRNAs are differentially expressed during aging, their roles as dynamic regulators of tissue regeneration have yet to be described. We show that in the highly regenerative Drosophila testis, miR-34 levels are significantly elevated during aging. miR-34 modulates germ cell death and protects the progenitor germ cells from accelerated aging. However, miR-34 is not expressed in the progenitors themselves but rather in neighboring cyst cells that kill the progenitors. Transcriptomics followed by functional analysis revealed that during aging, miR-34 modifies integrin signaling by limiting the levels of the heterodimeric integrin receptor αPS2 and βPS subunits. In addition, we found that in cyst cells, this heterodimer is essential for inducing phagoptosis and degradation of the progenitor germ cells. Together, these data suggest that the miR-34 – integrin signaling axis acts as a sensor of progenitor germ cell death to extend progenitor functionality during aging. Illumina cDNA libraries were prepared from 1 µg total RNA extracted from testes of young and aged control w1118 and miR-34 null mutants. Sequencing libraries were prepared using INCPM mRNA Sequence Single-Read. Sixty reads were sequenced on two lanes of an Illumina HiSeq apparatus. The output was ~22 million reads per sample. Poly-A/T stretches and Illumina adapters were trimmed from the reads using cutadapt. Resulting reads shorter than 30 bp were discarded. Reads were mapped to the Drosophila melanogaster dmel reference genome using STAR, supplied with gene annotations downloaded from FlyBase (r6.18) (and with the EndToEnd option and outFilterMismatchNoverLmax was set to 0.04).
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5061/dryad.tdz08kq58&type=result"></script>');
-->
</script>
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5061/dryad.tdz08kq58&type=result"></script>');
-->
</script>
# Cryptic diversity of cellulose-degrading gut bacteria in industrialized humans Data sources files for all figures and supplementary material Description of the data and file structure Data S1 Figure 1B tree.nwk is an unrooted phylogenetic tree, computed with the maximum likelihood method, of 62 selected genomes and MAGs, using the sequence of the ScaC scaffoldin as a phylotyping marker. The file contains the newick format used to create phylogenetic tree. Data S2 Figure 1C data.xlsx Genomic dissimilarity computed by Mash distance within the novel identified ruminococcal cellulosomal species and pairwise comparisons to each other as well as to the ruminal R. flavefaciens species and the human species, R. champanellensis. Each column represents the mash distances. Data S3 Figure 2A.xlsx, Observed collective prevalence of the MAGs for fiber-degrading strains in various human, apes and NHP cohorts. Values in columns represents the number of positive individuals, and percentages are also given for each host category. Data S4 Figure 2Ci.xlsx Distribution of each human cellulosomal strain (R. champanellensis, R. hominiciens, R. ruminiciens and R. primaciens) across the sample cohorts. For each genome (column A), samples (column B) and host (column C), average fold and number of strains are given. Data S5 Figure 2Cii.xlsx, Distribution of the human cellulosomal strains among the human- and NHP-positive samples. 1st sheet for each genome and host, the number of samples is given (column C) as well as the percentage of positive samples (column D), the 2nd sheet includes the siginificant indval statistics for each genome group. Data S6 Figure 3A host tree.nwk is the newick format used to create phylogenetic host tree in Figure 3A, a phylogenetic tree of the mammalian host species. Data S7 Figure 3A tree.nwk is the newick format used to create phylogenetic tree in Figure 3A, a core protein phylogenetic tree illustrating the co-speciation hypothesis. Data S8 Figure 3B tree.nwk is the newick format used to create phylogenetic tree in Figure 3B, a phylogenetic tree of 197 concatenated core proteins. Data S9 Figure 4C.xlsx concentration of reducing sugars is given for the two enzymes at the different time points. Comparative cellulolytic activity of ruminococcal GH5 orthologs of either human (R. primaciens) or rumen origin (R. flavefaciens FD-1). Enzyme samples were examined using microcrystalline cellulose (Avicel) as the substrate at 37°C. Data S10 Figure 5A presence/absence (0 or 1) is giving for each MAG and gene clusters (column 1). Analysis (PCA) of the overall predicted ORFs of the MAGs. Data S11 Figure 5B.xlsx column A verticality values for common genes, column B verticality values for specific genes. Rank distribution of verticality values for core proteins across the three host types versus host-specific proteins indicates that specific genes are likely to be transferred via horizontal gene transfer within a given type of host. Data S12 Figure 5C in each column number of GH genes for each specific MAG. Analysis of the fibrolytic system [indicating glycoside hydrolase (GH) families] of the MAGs, according to their hosts. Data S13 Figure 5D transcripts expression of fibrolytic genes for the 3 hosts, 3 individual each. Analysis of the expression of the fibrolytic system, as examined by transcriptomic analysis of three fecal samples of the three hosts (macaque, human and sheep rumen). Data S14 Figure 5E middle panel number of specific genes copies for each specific MAG. The statistically significant GH families that statistically distinguish the strains associated with the three gut ecosystems as determined by the Kruskal-Wallis test p* *<0.05 after FDR correction. Data S15 Figure 5E right panel transcripts expression of specific genes for the 3 hosts, 3 individual each. Statistically significant GH expression (metatranscripts in FPKM) between the three types of hosts. Data S16 Figure 5E verticality values.xlsx verticality values are given for the indicated genes. Verticality values for each of statistically significant GH families. Data S17 Supplementary Figure S1.xlsx Prevalence of the fibrolytic strains in 1989 gut metagenomic samples. Sheet 1 presence is given as 1 in column C for a specific genome in column B, host group (column D) and host animal (column E) and run (column A). Average fold are given in column E, lifestyle in G, country in H, number of strains in the samples in I, and host category in J. Sheet 2 is the raw data, sheet 3 is same as sheet 2 but a filtered coverage at 20% for the covered percent column (column T) Data S18 Supplementary Figure S2A Prevalence and abundance of the fibrolytic strains in various human and NHPs gut samples. Prevalence of the cellulosomal strains (R. hominiciens, R. ruminiciens, R. primaciens and R. flavefaciens) is given for various subsampling read depths (5, 10, 20, 40 and 60 M). The maximal numbers of individuals in the cohorts are given for each host category at 5 M read depths. Sheet 1 prevalences are given in column D for a specific genome (column A) readsubsampling (column B), host categories are given in column C. Sheet 2 statistics Chi-square test , including pvalue, significance and adj. pvalue are given between the specific host categories and genomes. In sheet 4 additional metadata for the samples with a coverage above 20 % are given including the original study in column A. Data S19 Supplementary Figure S2B.xlsx Prevalence of the cellulosomal strains for each host category at 10 M read depths. Sheet 1 prevalences are given in column D for a specific genome (column A) readsubsampling (column B), host categories are given in column C. Additional informations include country of origin (column E), host (column F), lifestyle (column G), host catergory (column H), depth (column I), host species (column J), host group (column K), additional host category (column L). Sheet 2 statistics Chi-square test , including pvalue, significance and adj. pvalue are given between the specific host categories and genomes. Data S20 Supplementary Figure S2C.xlsx Abundance of the cellulosomal strains (R. hominiciens, R. ruminiciens, R. primaciens and R. flavefaciens) for each category at 20 M read depths. Sheet 1 abundances are given in column D for a specific genome (column A) readsubsampling (column B), samples are given in column C. Sheet 2 statistics Chi-square test , including pvalue, significance and adj. pvalue are given between the specific host categories and genomes. Data S21 Supplementary Figure S3.xlsx Prevalence of the fibrolytic strains in NHP gut samples. Prevalence of the cellulosomal strains (R. hominiciens, R. ruminiciens, R. primaciens and R. flavefaciens) is given for various subsampling read depths (5, 10, 20, 40 and 60 M). Sheet 1 prevalences are given in column D for a specific genome (column A) readsubsampling (column B), lsamples hosts are given in column C. Sheet 2 statistics Chi-square test , including pvalue, significance and adj. pvalue are given between the specific hosts and genomes. Data S22 Supplementary Figure S4.xlsx : Prevalence of the fibrolytic strains in industrialized countries. Prevalence of the cellulosomal strains (R. hominiciens, R. ruminiciens, R. primaciens and R. flavefaciens) is given for 10 M read depths. Sheet 1 prevalences are given in column D for a specific genome (column A) readsubsampling (column B), locations of the samples are given in column C. Sheet 2 statistics Chi-square test , including pvalue, significance and adj. pvalue are given between the specific locations and genomes. Data S23 Supplementary Figure S5.xlsx Prevalence of the fibrolytic strains in rural societies countries. Prevalence of the cellulosomal strains (R. hominiciens, R. ruminiciens, R. primaciens and R. flavefaciens) is given for 10 M read depths. Sheet 1 prevalences are given in column D for a specific genome (column A) readsubsampling (column B), locations of the samples are given in column C. Sheet 2 statistics Chi-square test , including pvalue, significance and adj. pvalue are given between the specific locations and genomes. Data S24 Supplementary Figure S6.xlsx Abundance of the MAGs in their ecosystem. Abundance of each MAG in each sample is given Data S25 Supplementary Figure S7.xlsx Prevalence of the fibrolytic strains in captive and wild NHPs. Prevalence of the cellulosomal strains (R. hominiciens, R. ruminiciens, R. primaciens and R. flavefaciens) is given for 10 M read depths in various animals (apes: chimpanzees, gorillas, and orangutang, other NHPs: macaques, tamarins, baboons, mandrills, capuchins, colobus monkeys, guerezas and geladas, and ruminants: cows, sheep, camels, yaks and deer). Sheet 1 number of positive (column E) and negative samples (column F) out of total number of samples examined (column D) is given for each host group (column A) and genomes (column B), lifestyle appears in column C. Sheet 2 statistics Chi-square test , including pvalue, significance and adj. pvalue are given for the specific host groups and genomes. Data S26 Supplementary Figure S8.xlsx Prevalence of the fibrolytic strains in omnivore and folivore NHPs. Prevalence of the cellulosomal strains (R. hominiciens, R. ruminiciens, R. primaciens and R. flavefaciens) is given for 10 M read depths in various NHPs (not including apes that are all omnivores). Sheet 1 prevalence (column D) of a specific genome (column A) is given, column B is the reading depth and column C the diet of the animal. Sheet 2 statistics Pearson's Chi-squared test with Yates' continuity correction and p value are given for the 3 genomes examined Data S27 Supplementary Figure S9.xlsx Identification of core proteins. Presence/absence (0 or 1) is giving for each MAG and gene clusters (column 1) Data S28 Supplementary Figure S10 tree.nwk Multilocus sequence analysis of the 30 MAGs. The file contains the newick format used to create phylogenetic tree in Figure S11 Data S29 Supplementary Figure S11.xlsx Comparative cellulolytic activity of ruminococcal GH5 orthologs of either human (R. primaciens) or rumen origin (R. flavefaciens FD-1). Enzyme samples were examined at various concentrations using amorphous cellulose as the substrate at 37°C for 1h incubation. Concentration of reducing sugars is given for the two enzymes at the different enzyme concentrations Data S30 Supplementary Figure S12.xlsx Transcriptomic analysis of R. flavefaciens, R. hominiciens and R. primaciens strains. The total gene expression in percentage of the specific MAGs in the three fecal samples of the three hosts (macaque, human and sheep rumen) is in sheet 1 overall transcripts for the 3 hosts, 3 individuals each. In sheet 2, 3 and 4 the number of transcripts (in FPKM) of each of the indicated cellulosomal genes in three fecal samples from either sheep, human or macaque host. Sheet 2, transcripts for the 3 sheep individuals for each gene, sheet 3, transcripts for the 3 human individuals for each gene, sheet 4, transcripts for the 3 macaque individuals for each gene/ Data S31 Supplementary Figure S13.xlsx Fibrolytic cellulosomal core enzymes of the 30 MAGs with respect to their sample of origin (human-, rumen- and NHPs-assembled MAGs). In each column number of GH genes for each specific MAG Data S32 197 phylogenetic trees.pdf are the compilations of the 197 phylogenetic tree used for evolutionary analysis Humans, like all mammals, depend on the gut microbiome for digestion of cellulose, the main component of plant fiber, but evidence for cellulose fermentation in the human gut is scarce. We have identified ruminococcal species in the gut microbiota of human populations that assemble functional multi-enzymatic cellulosome systems capable of degrading plant cell wall polysaccharides. One of these species, which is strongly associated with humans, likely originated in the ruminant gut and was subsequently transferred to the human gut potentially during domestication, where it underwent diversification and diet-related adaptation through the acquisition of genes from other gut microbes. Collectively, these species are abundant and widespread among ancient humans, hunter-gatherers, and rural populations, but are extremely rare in populations from industrialized societies, suggesting potential disappearance in response to the westernized lifestyle.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5061/dryad.z08kprrkj&type=result"></script>');
-->
</script>
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5061/dryad.z08kprrkj&type=result"></script>');
-->
</script>
# Cranes soar on thermal updrafts behind cold fronts as they migrate across the sea This dataset and accompanying R scripts pertain to the research study on crane migration above the sea. The focus of this study is to analyze the migratory patterns of cranes, particularly their utilization of thermal updrafts and soaring-gliding dynamics during sea crossing. The dataset includes high-resolution GPS and accelerometer data, annotated with environmental variables, and the R scripts are tailored for statistical analysis and visualization of these data. ## Description of the data and file structure The data folder contains the following files: **1HzGPS_and_ACC_annotated (multiple files):** * 1Hz data for individual cranes, annotated with environmental variables. * File names include individual crane IDs. **ThermalStats.csv:** * Summarizes 10-minute segments with and without thermal activity. * Used in `ThermalSections_analysis&plot.R` and `time_scales_analysis.R`. **SoarGlide.csv:** * Contains data on coupled soaring-gliding events. * Used in `soaring&gliding_plot&analyse.R`. **AnnotatedTimePointsMedSea.csv:** * Data for annotated time scale data for Mediterranean sea crossing. * Used in `time_scales_analysis.R`. **TimesOfMigrationFall.csv:** * Number of days crane stayed at the last stopover site before crossing the Mediterranean sea * Used in `time_scales_analysis.R`. ## Code/Software The code folder includes R scripts designed for the statistical analysis high-resolution datasets: **ThermalSections_analysis&plot.R:** * Performs statistical analysis of 10-minute thermal sections. * Generates Figure 2B. **soaring&gliding_plot&analyse.R:** * Analyzes coupled soaring-gliding events. * Generates Figure 3A. **time_scales_analysis.R:** * Conducts time-scale analysis. * Generates Figure 4A. ### Metadata Description ### (1) Data 1HzGPS\_and\_ACC\_annotated Tables: NAN indicates not available. 1. **Tag:** Identifier for the tracked animal 2. **TimeCont:** timestamps of observations (UTC). 3. **Latitude:** Geographic latitude where the observation was recorded. 4. **Longitude:** Geographic longitude where the observation was recorded. 5. **Altitude_m:** Altitude of the crane (m). 6. **Speed_km_h:** Speed of the crane (km/h). 7. **direction_deg:** Direction of crane movement (degrees clockwise from north). 8. **OverSeaOrLand:** Indicator of whether the crane is over the sea or land during observation (0=land, 1=Black sea, 2=Mediterranean sea). 9. **mag_x, mag_y, mag_z:** Components of the magnetic field measurement in the x, y, and z directions. 10. **acc_x, acc_y, acc_z:** Accelerometer readings in the x, y, and z directions (m^2/sec). 11. **Interpolated_Lat:** Interpolated latitude values. 12. **Interpolated_Lon:** Interpolated longitude 13. **InterpolatedElevation:** Interpolated elevation values (m). 14. **running_Flap_rate:** estimated flap rate form 1Hz data based on a calibration dataset of 10Hz (flaps per second) 15. **AleAboveTerrain:** Altitude of the crane above the terrain (m). 16. **TerrainHeight:** Height of the terrain above sea level (m). 17. **quartile:** Statistical quartile information. 18. **Individual:** tag number used for environmental annotation. 19. **t2m:** Temperature at 2 meters above surface level (°C). 20. **sst:** Sea surface in C temperature (°C). 21. **blh:** Boundary layer height (m). 22. **cbh:** Cloud base height (m). 23. **cape:** Convective available potential energy (J/kg). 24. **cin:** Convective inhibition (J/kg). 25. **msshf:** Mean sea sensible heat flux (W/m^2). 26. **tcc:** Total cloud cover (%). 27. **msl:** Mean sea level pressure (Pa). 28. **u850, u925:** Zonal wind component at 850 hPa and 925 hPa pressure levels. 29. **v850, v925:** Meridional wind component at 850 hPa and 925 hPa pressure levels. 30. **t850, t925:** Temperature at 850 hPa and 925 hPa pressure levels. 31. **sstDiff:** Difference between sea and air temeperature (°C) 32. **windDirection850, windDirection925:** Wind direction at 850 hPa and 925 hPa pressure levels. 33. **windSpeed850, windSpeed925:** Wind speed at 850 hPa and 925 hPa pressure levels. ### **(2) ThermalStats:** NAN indicates not available. 1. **Individual**: Identifier for the tracked animal. 2. **UniqueSectionCounter**: Sequential counter for a distinct 1 Hz GPS burst (min10 min., but can be longer). 3. **Date**: Date and time of observation (DD-MMM-YYYY HH:MM:SS). 4. **lat_start**: Geographic latitude at the start of a 10 minutes segment (Degrees). 5. **lon_start**: Geographic longitude at the start of a 10 minutes segment (Degrees). 6. **OverSea**: Flag indicating whether the segment is over the sea (0=land, 1=Black sea, 2=Mediterranean sea). 7. **day_time**: Flag indicating it during daytime or during nighttime (0=night, 1=day). 8. **time_since_sunrise_h**: Time since sunrise at the start of the segment (Hours). 9. **time_since_sunset_h**: Time since sunset at the start of the segment (Hours). 10. **time_of_part_min**: Duration of the observed segment (Minutes). 11. **TotalDistance**: Total distance covered in the segment (meters). 12. **vg**: Ground speed during the segment (m/sec). 13. **va**: Airspeed during the segment (m/sec). 14. **tw**: Tailwind component during the segment (m/sec). 15. **sw**: Sidewind component during the segment (m/sec). 16. **meanFlapRate**: Average flap rate of the individual during the segment (Flaps/sec). 17. **percent_time_in_thermals**: Percentage of the segment time spent in thermals (%). 18. **time_in_thermals_sec**: Total time spent in thermals during the segment (Seconds). 19. **mean_thermal_length_sec**: Average duration of thermal events during the segment (Seconds). 20. **mean_climb_rate**: Average climb rate during thermal events (m/s). 21. **mean_max_elevation_above_ground**: Average maximum elevation above ground level for each thermal during the segment (Meters). 22. **Mean_blh**: Mean boundary layer height during the segment (Meters). 23. **Mean_msl**: Mean sea level pressure during the segment (Pa). 24. **Mean_DeltaT**: Mean temperature difference during the segment (Degrees Celsius). 25. **wvel**: Wind velocity during the segment (m/s). 26. **wang**: Wind angle during the segment (Degrees). 27. **number_of_thermals**: Number of thermal events encountered during the segment. 28. **age**: Categorical age of the individual (1= Breeding adult, 2 = Adult with unknown breeding status, 3 = Subadult, 4 = Juvenile). 29. **age_w**: Age class of the individual. 30. **sex**: Sex of the individual (e.g., Male, Female, Unknown). ### **(3) SoarGlide:** NAN indicates not available. 1. **individual**: Identifier for the tracked animal. 2. **datetime_start**: Date and time when the soaring-gliding segment started (DD-MMM-YYYY HH:MM:SS). 3. **lon_start**: Geographic longitude at the start of the observation (Degrees). 4. **lat_start**: Geographic latitude at the start of the observation (Degrees). 5. **time_soaring_sec**: Total time spent soaring during soaring-gliding segment (Seconds). 6. **climb_rate_m_sec**: Average climb rate while soaring (m/s). 7. **falp_rate_climb**: Flap rate during the climb phase (Flaps/sec). 8. **falp_prop_climb**: Proportion of time flapping during the climb phase (%). 9. **type_thermal**: Type of thermal encountered (1 = classic soaring, 2 = spring-like soaring pattern). 10. **start_alt_above_terr**: Starting altitude above terrain at the beginning of the soaring phase (Meters). 11. **exit_alt_above_terr**: Exit altitude above terrain at the end of the soaring phase (Meters). 12. **time_gliding_sec**: Total time spent gliding during the soaring-gliding segment (Seconds). 13. **distance_gliding_m**: Total distance covered while gliding (Meters). 14. **sink_speed_m_sec**: Average sink speed while gliding (m/s). 15. **air_speed_m_sec**: Airspeed during the gliding phase (m/s). 16. **air_speed_calc_m_sec**: airspeed during the gliding phase based on tailwind estimated from thermal drift (m/s). 17. **tail_wind**: tailwind during the gliding phase (m/s). 18. **tail_wind_calc**: estimated tailwind using bird drift in thermals from the horizontal displacement of thermals (m/s). 19. **side_wind**: sidewind during the gliding phase (m/s). 20. **side_wind_calc**: estimated sidewind using bird drift in thermals from the horizontal displacement of thermals (m/s). 21. **falp_rate_glide**: Flap rate during the gliding phase (Flaps/sec). 22. **falp_prop_glide**: Proportion of time flapping during the gliding phase (%). 23. **mean_blh**: Mean boundary layer height during the observation period (Meters). 24. **sea_land**: Indicator of whether the soaring-gliding segment (0=land, 1=Black sea, 2=Mediterranean sea).. 25. **day_time**: Indicator of whether the soaring-gliding segment was during daytime or nighttime (0=night, 1=day). 26. **age**: Categorical age of the individual (1= Breeding adult, 2 = Adult with unknown breeding status, 3 = Subadult, 4 = Juvenile). 27. **age_w**: Age class of the individual. 28. **sex**: Sex of the individual (Male, Female, Unknown). 29. **Vopt**: Optimal flight speed which maximizes bird cross-country speed (m/s). 30. **RAFI**: Risk-Averse Flight Index. 31. **RAFIcalc**: Risk-Averse Flight Index calculated based on estimated tailwind. 32. **Date**: Date of the soaring-gliding segment (DD-MMM-YY). ### **(4) AnnotatedTimePointsMedSea:** NAN indicates not available. 1. **indev**: Identifier for the tracked animal. 2. **Date**: Date of the real sea crossing event (YYYY-MM-DD). 3. **fall**: Flag indicating whether the season of the sea crossing event (0 = spring, 1 = fall). 4. **thermap_presence**: Indicator of thermal presence during the observation (0 = no thermal soring during crossing, 1 = at lest 1 thermal soaring event). 5. **daysBack**: Number of timescales relative to the autumn real sea crossing event. 6. **lat**: Geographic latitude used for annotation (Degrees). 7. **lon**: Geographic longitude used for annotation (Degrees). 8. **mig_ang**: Migration angle at the time of observation (Degrees). 9. **datetime_utc**: Date and time at the location (DD-MMM-YYYY HH:MM:SS). 10. **datetime**: Date and time UTC used for annotation (DD-MMM-YYYY HH:MM:SS). 11. **t2m**: Temperature at 2 meters above the surface (Degrees Celsius). 12. **sst**: Sea surface temperature (Degrees Celsius). 13. **blh**: Boundary layer height (Meters). 14. **cbh**: Cloud base height (Meters). 15. **cape**: Convective Available Potential Energy (Joules per kilogram). 16. **cin**: Convective Inhibition (Joules per kilogram). 17. **msshf**: Mean Sea Surface Heat Flux (Watts per square meter). 18. **tcc**: Total Cloud Cover (fraction from 0 to 1). 19. **msl**: Mean Sea Level Pressure (Pascals). 20. **u850**: Zonal wind component at 850 hPa (Meters per second). 21. **u925**: Zonal wind component at 925 hPa (Meters per second). 22. **v850**: Meridional wind component at 850 hPa (Meters per second). 23. **v925**: Meridional wind component at 925 hPa (Meters per second). 24. **t850**: Temperature at 850 hPa (Degrees Celsius). 25. **t925**: Temperature at 925 hPa (Degrees Celsius). 26. **sstdiff**: Difference between sst and t2m (Degrees Celsius). 27. **windDir850**: Wind direction at 850 hPa (Degrees). 28. **windSpeed850**: Wind speed at 850 hPa (Meters per second). 29. **windDir925**: Wind direction at 925 hPa (Degrees). 30. **windSpeed925**: Wind speed at 925 hPa (Meters per second). 31. **GPCC**: Precipitation (Millimeters). 32. **time_point**: location during the crossing used for annotation, "sea enter" was used for the analysis (1 = starting point, 2 = sea enter 3 =in sea). 33. **tw**: Tailwind component (Meters per second). 34. **sw**: Sidewind component (Meters per second). ### **(5) TimesOfMigrationFall:** NAN indicates not available. 1. **indev**: Identifier for the tracked animal. 2. **Date**: Date of the sea crossing event (YYYY-MM-DD). 3. **fall**: Flag indicating season (0 = spring, 1 = fall). 4. **Hz1_section_num**: number of 10 minutes 1Hz sections during the sea crossing event. 5. **mean_thermal_prop**: Mean proportion of time spent in thermals during the sea crossing event. 6. **prop_section_with_thermal**: Proportion of 10 minutes sections containing thermal soaring during the sea crossing event. 7. **thermals**: Number of thermal events encountered during the sea crossing event. 8. **lat_start_migration**: Geographic latitude at the start of the sea crossing event (Degrees). 9. **lon_start_migration**: Geographic longitude at the start of the sea crossing event (Degrees). 10. **datetime_start_migration_utc**: Date and time at the start of the sea crossing event in UTC (DD-MMM-YYYY HH:MM:SS). 11. **datetime_enter_sea_utc**: Date and time when the individual enters the sea area in UTC (DD-MMM-YYYY HH:MM:SS). 12. **datetime_exit_sea_utc**: Date and time when the individual exits the sea area in UTC (DD-MMM-YYYY HH:MM:SS). 13. **time_at_sea**: Total time spent at sea during the sea crossing event (Hours). 14. **Days_at_stopover**: Number of days spent at the last stopover site before the sea crossing event. 15. **ang**: Angle of movement at the beginning of the sea crossing event (Degrees). Thermal soaring conditions above the sea have long been assumed absent or too weak for terrestrial migrating birds, forcing large obligate soarers to take long detours and avoid sea crossing, and facultative soarers to cross exclusively by costly flapping flight. Thus, while atmospheric convection does develop at sea and is utilized by some seabirds, it has been largely ignored in avian migration research. Here we provide direct evidence for routine thermal soaring over open sea in the common crane, the heaviest facultative soarer known among terrestrial migrating birds. Using high-resolution biologging from 44 cranes tracked across their transcontinental migration over 4 years, we show that soaring characteristics and performance were no different over sea than over land in mid-latitudes. Sea-soaring occurred predominantly in autumn when large water-air temperature difference followed mid-latitude cyclones. Our findings challenge a fundamental paradigm in avian migration research and suggest that large soaring migrants avoid sea crossing not due to absence or weakness of thermals but due to their uncertainty and the costs of prolonged flapping. Marine cold air outbreaks, imperative to the global energy budget and climate system, may also be important for bird migration, calling for more multidisciplinary research across biological and atmospheric sciences.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5061/dryad.t76hdr871&type=result"></script>');
-->
</script>
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5061/dryad.t76hdr871&type=result"></script>');
-->
</script>
We (all authors of this work) caught active squamates in the field, in many sites across the world, and measured their body temperatures (Tb). We then measured substrate temperatures (Tsub) and/or air temperatures (Ta) at the specific location where each individual was found. The method of measurement varied among groups. Most of us took cloacal temperatures using either a digital thermocouple or an analogue thermometer, but in a few cases, body temperature was measured using an infrared thermometer (measuring skin temperature) or temperature-sensitive radio transmitters. Cloacal temperatures were taken immediately (no more than 1 minute) after the individual was caught. Note that these environmental temperature data are used here in the absence of measurements of other thermal properties of the environment. Thus, they do not enable to qualify thermal quality and thermoregulatory strategy and efficiency (Hertz et al., 1993). Protocols were consistent for each species and therefore could be corrected for in the statistical models. We filtered the data to include only species with records from at least 20 individuals per species. To account for phylogenetic non-independence in the subsequent statistical analyses, we used the full imputed phylogenetic tree of Tonini et al. (2016). Species absent from this phylogenetic tree were inserted into it manually when possible (in place of a sister species or into an existing polytomy) and otherwise were excluded from the analysis. Since the Tonini et al. tree contains several polytomies, which are known to affect phylogenetic analyses (Molina-Venegas & Rodríguez, 2017), we repeated all of the analyses using the tree from Zheng & Wiens (2016) which has 42 fewer species but is fully resolved. We divided species by diel activity and basking behaviour, according to the literature and our own observations. We did not base the partitioning of species on the temperature measurements to prevent circularity of the definitions (Vitt et al., 2008). We classified species according to these behavioural categories, rather than between thermoregulators versus thermoconformers, because the latter is unknown for many species, and because discerning between thermoconformers and actively regulating thigmotherms is difficult (Doan et al., 2022; Hertz et al., 1993). We categorized species that are not commonly observed exhibiting basking behaviour as “non-heliothermic” rather than “thigmotherms”, since we classified them by observable behaviour and not according to the sources of heat gain and loss, of which we cannot be sure without direct testing. That is, each researcher or group classified the behaviour of the species which they contributed to the database, according to the literature and their own observations and expertise. This classification, while qualitative and to an extent subjective, was carried out before any of the analyses to prevent them from being biased by the authors’ hypotheses. Diurnal snakes were placed in a separate category despite basking since their thermal biology is considered distinct from that of the more commonly studied lizards (Gibson & Falls, 1979; Avery, 1982; Whitaker & Shine, 2002). We did not have measurements of enough nocturnal snake species to include them as a separate category and grouped them with the nocturnal lizards. Species were classified into four categories: 1. “heliotherms” (heliothermic lizards), 2. “non-heliotherms” (diurnal non-heliothermic lizards), 3. “diurnal snakes”, and 4. “nocturnal species”. We derived the mean annual temperature, as a proxy for the macroclimatic conditions, at the site where each species was measured (1970-2000 average, data from BIO1 in WorldClim; Fick & Hijmans, 2017). When we had no body mass data for a species from measurements of the individuals used in the temperature measurements, we estimated it from mean species snout-vent length data (either from the individuals measured or from Meiri et al., 2021) using allometric equations from Feldman et al. (2016) and Meiri et al. (2021). # A global analysis of field body temperatures of active squamates in relation to climate and behaviour | Column title | Type | Description | | :--------------- | :------ | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Species | factor | Binomial name, updated to fit the Reptile Database 2022 | | Category | factor | \[Based on the data in rows 4-6]. Helio\_liz = heliothermic lizard. Non\_helio\_liz = non-heliothermic lizard. Snake\_diur = diurnal snakes. Nocturnal = nocturnal lizards and snakes. | | Taxon | factor | Lizard or snake | | Activity | factor | Diurnal or nocturnal. Cathemeral species were assigned to the time of day they had been documented | | Behaviour | factor | Heliothermic or not. According to the literature and the researcher's \[see row 12] personal expertise | | Tb | integer | Body temperature (degrees Celsius) | | Tsub | integer | Substrate temperature (degrees Celsius, at the location where Tb was taken) | | Ta | integer | Air temperature (degrees Celsius, at the location where Tb was taken) | | Latitude | integer | Decimal degrees. If exact location could not be provided (e.g. in protected species where location is not publicly available), rounded to the nearest 0.1 degree | | Longitude | integer | Decimal degrees. If exact location could not be provided (e.g. in protected species where location is not publicly available), rounded to the nearest 0.1 degree | | Research group | factor | Initials of the researchers who measured this individual. People working together and using the same methodology were grouped together. | | Tb device | text | Model of the device | | Tb\_method | factor | Tb device separated into three categories: cloacal probe, skin (infrared), and radio transmitter | | Ta device | text | Model of the device | | Ta height | text | Height (in cm unless otherwise indicated) of the Ta measurement device above ground | | Ta\_height | Factor | Ta height separated into three categories: <5cm, 5-15cm, and >50cm | | Measur radiation | text | Was the animal location when caught sunlit, shaded, etc. | | Tsub device | text | Model of the device | | Country | factor | Country where the animal was measured (no political statement is intended, in the case of disputed territories) | | Date | text | When the measurement was taken. Exact dates, if known, are in dd/mm/yyy format. | | Time | text | Hour of the measurement, if known | | Age | factor | Adult, subadult, juvenile, or unknown | | Sex | factor | Male, female, or unknown | | Locality | text | Name of the region or location | | Weather | text | Weather observations at the time of measurement | | log mean mass | integer | log10 of the mean species mass (in grams). Mass was calculated from our data if available, or from snout-vent length data using the allometric equations from Feldman et al. (2016) and Meiri et al. (2021) | | Notes | Text | Any further information | | Active? | Factor | Yes/No. Was the animal active, or not (e.g., sleeping, thermoregulating, resting under cover, etc.) | | Tsub\_use | Factor | Yes/No. Did the data in this row fit the criteria to be used in the Tsub analyses (n>20 active individuals, phylogenetic data present) | | Ta\_use | Factor | Yes/No. Did the data in this row fit the criteria to be used in the Ta analyses (n>20 active individuals, phylogenetic data present) | *NOTE: blank cells indicate that no data is available for that variable. Aim: Squamate fitness is affected by body temperature, which in turn is influenced by environmental temperatures and, in many species, by exposure to solar radiation. The biophysical drivers of body temperature have been widely studied, but we lack an integrative synthesis of actual body temperatures experienced in the field, and their relationships to environmental temperatures, across phylogeny, behaviour, and climate. Location: Global (25 countries on six continents) Taxa: Squamates (210 species, representing 25 families) Methods: We measured body temperatures during activity for 20,231 individuals, and examined how body temperatures vary with substrate and air temperatures across taxa, climates, and behaviours (basking and diel activity). Results: Heliothermic lizards had the highest body temperatures and those most weakly correlated with substrate and air temperatures. Body temperatures of non-heliothermic diurnal lizards were similar to heliotherms in relation to air temperature but to nocturnal species in relation to substrate temperatures. Diurnal snake and non-heliothermic lizard body temperatures were more strongly correlated to air and substrate temperatures than in heliotherms. Correlation parameters of all diurnal squamates vary with mean annual temperatures, especially in heliotherms, so that the thermal relations of the various categories are disparate in cold climate but convergent in warm climate. Non-heliotherms and nocturnal body temperatures are better explained by substrate temperature than by air temperature. Body temperature distributions become left-skewed in warmer-bodied species, especially in colder climate. Main conclusions: Differences in squamate body temperatures, their environmental relationships, and frequency distributions are globally influenced by behavioural and climatic factors. Differences between behavioural categories are smaller in warm climates where environmental temperatures are generally favourable, but heliotherm body temperature remained consistently higher than all others.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5061/dryad.5dv41nscz&type=result"></script>');
-->
</script>
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5061/dryad.5dv41nscz&type=result"></script>');
-->
</script>
# Software for optimizing treatment to slow the spatial propagation of invasive species: code and results ## ## Description: The complete code and simulation results for finding the optimal treatment of a population front, to slow its propagation to a speed v. The algorithm is described in the paper "Optimizing strategies for slowing the spread of invasive species" by Adam Lampert (PLOS Computational Biology, DOI: 10.1371/journal.pcbi.1011996). Some of the results are demonstrated in Figs. 2-5 in that paper. ## Authorship: The code was written by Adam Lampert, Institute of Environmental Sciences, Robert H. Smith Faculty of Agriculture, Food and Environment, the Hebrew University of Jerusalem, Israel. ## Installation: Running the Matlab code requires the installation of Matlab 2021b for Windows (or a similar version of Matlab). ## Running the code – general model: 1\. Extract all files from "general_model_code.zip" into a single folder. 2\. Open "main.m" and "calc_cost.m" using Matlab. 3\. Change the parameter values, run "main.m," and wait until Matlab completes the execution. ## Running the code – spongy moth model: 1\. Extract all files from "spongy_moth_model_code.zip" into a single folder. 2\. Open "main_F2.m" using Matlab. 3\. Change the parameter values run the code, and wait until Matlab completes the execution. ## Description of the data files: The results for the general model's simulations are given as raw data in the folder "general_model_simulation_results.zip". The data files can be accessed with Matlab. Some of these results are demonstrated in the main article, Fig. 4. The results for the spongy moth model simulations are given as raw data in the folder "spongy_moth_model_simulation_results.zip". The data files can be accessed with Matlab. Some of these results are demonstrated in the main article, Fig. 5. Each data file in "general_model_simulation_results.zip" and in "spongy_moth_model_simulation_results.zip" includes the simulation results for a given set of parameters. The name of the file specifies the parameter values used. Specifically, for the general model, the file name indicates the values of α and v used for the simulation. For the spongy moth model, the file name indicates first the value of (kλ₀) and then the values of v used for the simulation. Each data file includes the following variables: * *n_front:* an array that includes the value of the population front (n-opt) as a function of the location (x). * *treatment:* an array that includes the value of the optimal treatment (A-opt) as a function of the location (x). * *Nx:* size of the n_front and the treatment arrays. The general model data files also cinclude the following parameter value: * *Dt:* time resolution (Δt) Spongy moth model data files also include the following parameter values: * *num_moves:* number of spatial steps the front moves per time unit (equivalent to v). * *delta:* the spatial resolution (σ). * *lambda:* the parameter λ₀. Slowing the spread of invasive species is a major challenge. How can we achieve this goal in the most cost-effective manner? This package includes the complete code and simulation results that help finding the optimal, most cost-effective treatment to slow the spread of a propagating species. This package accompanies the paper "Optimizing strategies for slowing the spread of invasive species" by Adam Lampert (PLOS Computational Biology, DOI: 10.1371/journal.pcbi.1011996). The file general_model_code.zip contains the code for the general model; the file spongy_moth_model_code.zip contains the code for the spongy moth model; and the file general_model_simulation_results.zip contains the results for the general model; and the file spongy_moth_model_simulation_results.zip contains the results for the spongy moth model. The code for the simulations was written in Matlab and the simulation results were obtained by running the code. Opening the code and results requires an installation of Matlab (2021b for Windows or a similar version).
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5061/dryad.dfn2z356h&type=result"></script>');
-->
</script>
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5061/dryad.dfn2z356h&type=result"></script>');
-->
</script>
Rare-earth monopnictides are a family of materials simultaneously displaying complex magnetism, strong electronic correlation, and topological band structure. The recently discovered emergent arc-like surface states in these materials have been attributed to the multi-wave-vector antiferromagnetic order, yet the direct experimental evidence has been elusive. Here we report the observation of non-collinear antiferromagnetic order with multiple modulations using spin-polarized scanning tunneling microscopy. Moreover, we discover a hidden spin-rotation transition of single-to-multiple modulations 2 K below the Neel temperature. The hidden transition coincides with the onset of the surface state splitting observed by our angle-resolved photoemission spectroscopy measurements. Single modulation gives rise to a band inversion with induced topological surface states in a local momentum region while the full Brillouin zone carries trivial topological indices, and multiple modulation further splits the surface bands via non-collinear spin tilting, as revealed by our calculations. The direct evidence of the non-collinear spin order in NdSb not only clarifies the mechanism of the emergent topological surface states but also opens up a new paradigm of control and manipulation of band topology with magnetism. # Data for: Hidden non-collinear spin-order induced topological surface states ## Description of the data and file structure There are three files in the dataset: Dataset.zip, filter.ipf, and DriftCorrection.ipf. filter.ipf is the IgorPro procedure for filtering out high-frequency noise of topographic images.\ DriftCorrection.ipf is the IgorPro procedure for drift correction of topographic images by the Lawler-Fujita algorithm. The dataset.zip contains folders arranged by the figures in the article "Hidden non-collinear spin-order induced topological surface states" to be published in Nature Communications. Each folder contains the STM raw data for plotting the STM images in the corresponding figure. Please refer to the article for a more detailed description of the data and methods. The STM data was collected by Omicron LT-STM at 4K. MTRX files were exported to the IgorPro files with the software Vernissage. Data analysis was performed on IgorPro. The figures are plotted with Origin and arranged in Adobe Illustrator. Use Vernissage to open the head file ending with "_0001.mtrx" and inspect the data contained within. The data can be exported as IgorPro files and further analyzed in IgorPro. IPF files are IgorPro procedures for data analysis.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5061/dryad.280gb5mv3&type=result"></script>');
-->
</script>
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5061/dryad.280gb5mv3&type=result"></script>');
-->
</script>
Mesocosm core setup and sampling procedure Samples were obtained during the AQUACOSM VIMS-Ehux mesocosm experiment in Raunefjorden near Bergen, Norway (60°16′11N; 5°13′07E), in May 2018. Seven bags were filled with 11m3 water from the fjord, containing natural plankton communities. Algal blooms were induced by nutrient addition and monitored for 24 days, as previously described23. 10 samples were collected from four bags, as follows: From bag 3, on days 15 and 20 (named B3T15, B3T20 correspondingly). From bag 4, on days 13, 15,19, and 20 (named B4T13, B4T15, B4T19, and B4T20, correspondingly). From bag 6, on day 17 (named B6T17). From bag 7, on days 16, 17, and 18 (named B7T16, B7T17, and B7T18, correspondingly). Samples were initially filtered as follows: 2 liters of water were filtered with a 20 µm mesh and collected in a glass bottle. The cells were then concentrated through gentle gravity filtration on a 3 µm polycarbonate filter (Whatman), mounted on a reusable bottle top filter holder (Thermo Fischer). The biomass on the filter was regularly resuspended by gentle pipetting. For samples B7T16, B7T18, B4T15, B3T15, B6T17, B7T17, and B4T19, the 2 liters of seawater were concentrated down to 100 ml, distributed in two 50 ml tubes, which corresponds to a 200 times concentration. For B4T13, the concentration factor was 140 times. For B4T20 and B3T20, the concentration factor was 100 times. The different concentration factors are explained by filter clogging and various field constraints, including processing time. For all samples except B3T20, the 50 ml tubes were centrifuged for 4 min at 2500g, after which the supernatant was discarded. Pellets corresponding to the same day and same bag were pooled and resuspended in a final volume of 200 µl of chilled PBS. 1800 µl of pre-chilled high-performance liquid chromatography (HPLC) grade 100% methanol was added drop by drop to the concentrated biomass. For B3T20, the concentrated biomass was centrifuged for 4 min at 2500g, resuspended in 100 µl of chilled PBS, to which 900 µl of chilled HPLC grade 100% methanol was added. Then, samples were incubated for 15 minutes on ice and stored at -80°C until further analysis. Library preparation and RNA-seq sequencing using 10X Genomics For analysis by 10X Genomics, tubes were defrosted and gently mixed, and 1.7 ml of the samples were transferred into an Eppendorf Lowbind tube and centrifuged at 4°C for 3 min at 3000g. The PBS/methanol mix was discarded and replaced by 400 µl of PBS. Cell concentration was measured using an iCyt Eclipse flow cytometer (SONY) based on forward scatter. Cell concentration ranged from 1044 cells ml-1 to 9855 cells ml-1. All concentrations were brought to 1000 cells ml-1 to target 7000 cells recovery, according to the 10X Genomics Cell Suspension Volume Calculator Table provided in the user guide. The cellular suspension was loaded onto Next GEM Chip G targeting 7000 cells and then ran on a Chromium Controller instrument to generate GEM emulsion (10x Genomics). Single-cell 3' RNA-seq libraries were generated according to the manufacturer's protocol (10x Genomics Chromium Single Cell 3' Reagent Kit User Guide v3/v3.1 Chemistry) on different occasions: B4T19 and B7T17 in January 2020 and B3T15, B3T20, B4T13, B4T15, B4T20, B6T17, B7T16, and B7T18 in August 2020 with 12 cycles for cDNA amplification and 15 cycles for library amplification. Library concentrations and quality were measured using the Qubit dsDNA High Sensitivity Assay kit (Life Technologies, Carlsbad, CA). Libraries were pooled according to targeted cell number, aiming for a minimum of 20,000 reads per cell. Pooled libraries were sequenced using the NextSeq® 500 High Output kit (75 cycles). Bioinformatic pipeline A step-by-step description of the bioinformatic pipeline from this step onward, including all in-house scripts used, is detailed in the GitHub repository under github.com/vardilab/host-virus-pairing. Detection of infected cells in the single-cell RNA-seq data using a custom viral genes database To detect viral transcripts, a reference was built from a database of highly conserved genes6 from all NCLDV in the Giant Virus Database9, such as family B DNA polymerase, RNA polymerase subunits, and the major capsid protein. The genes were clustered using CD-HIT v. 4.6.6 at 90% nucleotide identity To remove redundancy43. From this database of 34866 genes, a reference was created using the 10X Genomics Cell Ranger mkref command. The Cell Ranger Software Suite (v. 5.0.0) was used to perform barcode processing (demultiplexing) and single-cell unique molecular identifier (UMI) counting on the raw reads from 47391 cells using the count script (default parameters), with the deduplicated NCLDV database as a reference. For downstream analysis, 972 cells that highly expressed multiple NCLDV genes and were considered "highly infected" were selected. These 'highly infected' cells were selected based on the following criteria: (a) cell expresses in total ≥10 viral UMIs22,24, (b) expression of more than one viral gene (>1), (c) expression of at least one gene with a UMI count greater than one (>1). Cell selection was wrapped using an in-house script (choose_cells.py). Identifying the taxonomy of individual cells by sequence homology to ribosomal RNA Raw reads from each cell were pulled by the cell's unique barcode identifier using seqtk v. 1.2. Reads were then trimmed (command: trim_galore --phred33 -j 8 --length 36 -q 5 --stringency 1 --fastqc -e 0.1), and poly-A was removed (command: trim_galore --polyA -j 1 --length 36), using TrimGalore (v. 0.6.5), a Cutadapt wrapper 44. Trimmed reads from each cell were assembled using rnaSPAdes 3.1545 with kmer 21,33. Raw reads pulling, trimming, and assembly was wrapped using an in-house script (assemble_cells.sh). To identify the taxonomy of the cells, assembled contigs from each cell were matched against 18S rRNA sequences from the Protist Ribosomal Reference (PR2)46 and metaPR247. To remove redundancy, the sequences in each database were clustered using CD-HIT v. 4.6.6 at 99% identity43. Contigs were filtered using SortMeRNA v. 4.3.648 with default parameters against the PR2 database and then aligned to the PR2 and metaPR2 databases using Blastn49, at 99% identity, E-value ≤ 10-10 and alignment length of at least 100 bp. Contigs were ranked by their bitscore, and only the best hit was kept for each contig. Each contig was assigned to one of the following taxonomic groups that were prevalent in the sample: the classes Bacillariophyta, Prymnesiophyceae, Chrysophyceae, MAST-3, and Katablepharidaceae, the divisions Pseudofungi, Lobosa (Amoebozoa), Ciliphora (Ciliates), Dinoflagellata and Cercozoa. Contigs that matched other groups were assigned as "other eukaryotes". Contigs that matched more than one of these taxonomic groups were considered non-specific or chimeric and were therefore ignored. This downstream analysis of Blast result was wrapped using an in-house script (Sankey_wrapper_extended.ipynb). To avoid detection of doublets and predators, Cells that transcribe 18S rRNA transcripts homologous to more than one taxonomic group were conservatively omitted. Of the 972 infected cells detected, 418 (43%) were omitted because we could not assemble specific 18s rRNA contigs from them or because their identity was ambiguous. None of the cells that were assigned "other eukaryotes" had contigs with conflicting annotations (contigs matching different classes). Identifying the infecting virus using a homology search against a custom protein database To identify transcripts derived from giant viruses, reads from the detected 972 infected cells were compared to a custom protein database using a translated alignment approach. To ensure that as many giant viruses as possible were represented, a database was constructed by combining RefSeq v. 20750 with all predicted proteins in the Giant Virus Database9. The proteins were then masked with tantan51 (using the -p option) and generated the database with the lastdb command (using parameters -c, -p). To identify the infecting virus, the raw sequencing reads in each of the 972 single-cell transcriptomes were compared to the constructed database using LASTAL v. 95952 (parameters -m 100, -F 15, -u 2) with best matches retained. The same procedure was done for the assembled transcripts from each cell to identify viral transcripts. The results were analyzed at different taxonomic levels, consistent with the Giant Virus Database (for giant viruses) or NCBI taxonomy33(everything else). 754 Cells whose best matching virus was coccolithovirus were omitted from the downstream analysis since EhV-infected cells were already reported to be abundant in the algal bloom25, and our analysis aims to explore other host-virus pairs. Plotting host-virus pairs in a Sankey plot for host cells and their infecting giant viruses Of the 218 cells detected as infected by viruses other than EhV, 71 were selected that could be identified using assembled 18S rRNA transcripts and have at least 10 reads aligned to one of the virus families (Supplementary Data 1). Only links representing at least 10% of the aligned reads in each cell are shown in order to highlight the strong links. The Sankey plot was constructed using Holoviews v. 1.15.4; see sankey_wrapper.ipynb in the GitHub repository. Phylogenetic trees of viral and host marker genes For phylogenetic analysis, 31 cells were chosen based on a strong correlation (≥90% of viral reads matched one virus family) between the host and a virus. To obtain reference 18S rRNA sequences to include in a phylogeny, all transcripts assembled from these cells were compared to the PR2 database46 using BLASTN v. 2.9.0+ (parameters -perc_identity 95, -evalue 10-10, -max_target_seqs 20, -max_hsps 1). Sequences shorter than 1000 bp were removed from the reference, and the remainder of the sequences were de-replicated with cd-hit v. 4.743 (-c 0.99) to prevent the inclusion of excessive nearly identical references. Sequences were aligned with Muscle553 (default parameters), and diagnostic trees were created with FastTree 2.1.1054 for quick visualization of trees and for pruning long branches. The final phylogenetic trees were constructed with IQ-TREE v. 2.1.252 (parameters -m GTR+F+G4 -alrt 1000 -T AUTO --runs 10). To identify major capsid protein sequences in the single-cell transcriptomes, proteins were first predicted using FragGeneScanRs v. 1.1.056 (parameters -t, illumina_10). The resulting protein sequences were compared to MCP proteins in the Giant Virus Database with BLASTP v. 2.12.0+ (parameters -evalue 10-3, -max_target_seqs 20, -max_hsps 1) as well as to a custom MCP HMM that were previously designed6 using hmmsearch in the HMMER3 v. 3.3.2 package57 (E-value ≤ 10-3). The results of these searches were manually inspected, and sequences were subsequently aligned with Muscle 5 (default parameters). Similarly, as with the 18S rRNA sequences, diagnostic trees were first made with FastTree 2.1.10 and pruned long branches before making a final tree with IQ-TREE v. 2.1.2 (parameters m LG+F+G4 -alrt 1000 -T AUTO --runs 10). Cells for which transcripts are present in both viral and host trees were denoted (Supplementary Data 4). All the codes used to produce the trees are wrapped in the folder "marker_gene_trees" in the GitHub repository. Single-cell RNA-seq data alignment to a custom reference A new host-virus reference database was curated from the transcriptome of the infected cells (Fig. 2). Repetitive sequences were removed using BBduk (BBtools 38.90)58. An Additional long repetitive sequence was removed manually. A database of E. huxleyi and EhV genes, which were shown to be abundant in the samples25, was also added to this reference to specifically detect E. huxleyi cells and to avoid a non-specific alignment of reads from these cells to other contigs. For EhV, the predicted CDSs in the EhVM1 were used as a reference59. For the host, an integrated transcriptome reference of E. huxleyi was used as a reference60. Viral transcripts in the database were identified using a homology search against a custom protein database as described above. A reference was created from the database using the Cell Ranger mkref command. Raw reads were aligned to this reference database using 10X Genomics Cell Ranger v. 5.0.0 count analysis. Preprocessing of transcript abundance and dimensionality reduction A total of 28,656 cells from the 10 samples were initially aligned to the reference database. Cells with zero UMIs and cells with the lowest 1% number of UMIs, as compared with the distribution of transcripts per cell in the entire dataset, were removed for downstream analyses. To prevent cases of doublet or multiplet cells, which can be biological (cell digestion) or technical (fused cells), cells with the highest 1% number of UMIs were also removed. The raw UMIs of 28,015 cells were further preprocessed using the Python package scprep v. 1.0.10: Low expressing genes were filtered with filter.filter_rare_genes and min_cells=2. This number was chosen because we did not want to include genes mapped to only one cell, but we also did not want to exclude low-expressed genes, as they might represent gene expression of low-abundant organisms. Expression was normalized by cell library size with normalize.library_size_normalize, and the data was scaled with transform.sqrt. Preprocessing was wrapped in an in-house script; see 00.01.filter_normalize_scale_single_cell_data.py in the GitHub repository. To represent the cells in two dimensions based on their gene expression profiles, dimensionality reduction was performed using scprep v. 1.1.0 package PCA (method='svd', eps=0.1) and Uniform Manifold Approximation and Projection (UMAP) dimensionality reduction was conducted using the UMAP method in the manifold package of the Python library scikit-learn v. 0.24.1 (minimum distance=0.4 spread=2, number of neighbors=7). Dimensionality reduction was wrapped in an in-house script (00.02.dimentionality_reduction_single_cell_data.py). Assigning taxonomy to each detected cell using rRNA homology search To identify the taxonomy of each detected cell, reads from each cell were assembled independently. The taxonomy of the cells was determined by 18S rRNA homology to one of the following groups, which were abundant in the population: the classes Bacillariophyta (diatoms), Prymnesiophyceae, Chrysophyceae, MAST-3 and Katablepharidaceae, the divisions, Ciliphora (Ciliates), Dinoflagellata and Cercozoa. Other taxonomic groups were clustered under "Other eukaryotes". 16,358 cells were identified this way, and 11,657 cells that could not be identified were excluded from the plot for convenience. Cells with 18S rRNA contigs homologous to more than one taxonomic group were also conservatively omitted. As described above, cells expressing at least 10 viral UMIs were considered infected1,2. This section was wrapped in a Jupyter notebook (Coexpression_wrapper_extended.ipynb). Identifying the Leucocryptos host and its virus using homology search To better identify the detected Katablepharidaceae cells and to identify their infecting virus, 26 infected Katablepharidaceae cells from bag #4, day 20, were selected. Reads from these cells were retrieved using the unique molecular identifier and then trimmed using TrimGalore v. 0.6.5, a Cutadapt wrapper44. Trimming was wrapped in an in-house script; see pull_trim_clean.sh in the GitHub repository. Trimmed read files from all these cells were concatenated into one file and assembled altogether using rnaSPAdes v. 3.1545. To identify the specific Katablepharidaceae host, assembled contigs were matched against the PR2 rRNA database using blastn at 90% identity, E-value ≤ 10-10, and alignment length ≥ 100bp. Contigs best matched to an unknown Katablepharidaceae (>99% nucleotide identity), but after removing unidentified genera, these contigs best matched (>95% nucleotide identity) the Katablepharidaceae species Leucocryptos marina. Transcripts that matched classes other than Katablepharidaceae were matched against the entire NCBI database using the NCBI web server61. They, too, mostly matched Katablepharidaceae genes, specifically 28S rRNA or internal transcribed spacer (ITS) sequences (Supplementary Data 3). To identify the specific infecting virus, transcripts were matched against an NCLDV gene marker database6 at 90% identity, E-value ≤ 10-10, and alignment length ≥ 100bp. After finding homology to Leucocryptos and the virus GVMAG-M-3300020187-271, gene expression was calculated using RSEM v.1.3.162 (rsem-calculate-expression -p 10 --bowtie2 --fragment-length-mean 58). The genomic features of the virus were taken from Schulz (2020)1, and the viral genome was plotted using ShinyCircos v. 2.063. Gene expression in the plot is measured in expected counts after log 2 transformation. The relative abundance data in Fig. 4 was obtained from an 18S rRNA amplicon sequencing on a size fraction of 2-20µm in bag #4 during the mesocosm experiment23. Days 19, 22, and 23 were sampled twice; all other days were sampled once. In Fig. 4c, relative abundance is calculated per taxa as a fraction of all amplicon sequencing variants (ASV), excluding metazoans. Fig. 4d shows the fraction of Katablepharidaceae out of all ASVs matching Katablepharidaceae (excluding metazoans). E. huxleyi abundance was measured by flow cytometry based on high side scatter and high chlorophyll signals. These data were obtained from the source data of the same study23. Phylogenetic tree of Katablepharidaceae ASVs and 18S rRNA genes To verify the taxonomy of the ASVs, A phylogenetic tree was constructed of 89 ASVs identified as Katablepharidaceae, selected 18S rRNA sequences of Katablepharidaceae and other species from the PR2 database, and the longest single cell assembled contig from the infected Katablepharidaceae cells. Sequences were aligned with ClustalOmega v. 1.2.4 (default parameters)64. A diagnostic tree was first made with FastTree 2.1.1054 for pruning long branches before making the final tree with IQ-TREE55. All but three ASVs and one PR2 sequence clustered together with the assembled Leucocryptos transcript, verifying the taxonomy of 97% of the ASVs used in the relative abundance analysis (Extended Data Fig. 4). Phylogenetic trees of viral heat-shock proteins and metacaspase To examine the evolutionary history of the heat-shock proteins encoded in GVMAG-M-3300020187-27, phylogenetic trees of these proteins were constructed together with homologs present in eukaryotes, bacteria, archaea, and other giant viruses. For this, a custom database of proteins from reference genomes was compiled from EggNOG v. 5.065 (eukaryotes), bacteria and archaea (the Genome Taxonomy Database v. 95)66, and other giant viruses (the Giant Virus Database9). For bacterial and archaeal genomes in the GTDB, proteins were predicted first with Prodigal v. 2.6.367 using default parameters. Proteins were searched against Pfam models for each protein using hmmsearch with the noise cutoff (--cut_nc) and subsequently aligned sequences with ClustalOmega v. 1.2.3 (default parameters). Phylogenetic trees were constructed using IQ-TREE v. 2.1.255 (parameters m TEST -bb 1000 -T 6 --runs 10) using ultrafast bootstraps and with the best model determined with ModelFinder68. Substation matrixes used for the phylogenetic trees: Bax-1 - VT+F+R7; Metacaspase - VT+R7; HSP90 - LG+F+R10; HPS70 - LG+F+R10. # Single-cell RNA-seq of the rare virosphere reveals the native hosts of giant viruses in the marine environment Supplementary Files used in the project. These are the main intermediate files that can help reproduce the data. For a detail description on how to reproduce these files and using them, Go to the GitHub site: [https://github.com/vardilab/host-virus-pairing](https://github.com/vardilab/host-virus-pairing) ## Description of the data and file structure Fromm_2023_Data_Availability Sequences GVDB.markergenes.90.fna # De-duplicated database of NCLDV marker genes Transcripts_Cells_Ehux-EhV.95.fasta # The host-virus reference, based on the single-cell transcriptomes of infected cells, to which we added genes from EhV and E. huxleyi. Transcripts_Katablepharidacea.fasta # Transcripts assembled from a highly infected subpopulation of Katablepharidacea Blast_results first_cells.transcripts.edit.metaPR2.csv # Blastn results of single-cell transcripts assembled from highly infected cells (~970) detected against metapr2 database. first_cells.transcripts.edit.PR2.csv # Blastn results of single-cell transcripts assembled from highly infected cells (~970) detected against pr2 database. all_cells.transcripts.edit.metaPR2.csv # Blastn results of single-cell transcripts assembled from all cells (~28,000) detected against metapr2 database. all_cells.transcripts.edit.PR2.csv # Blastn results of single-cell transcripts assembled from all cells (~28,000) detected against metapr2 database. cells.filtered.blastx.csv # Blastx results of single-cell assembled transcripts against refseq database (to find viral transcripts). DATA-SPECIFIC INFORMATION FOR: all_cells.transcripts.edit.metaPR2.csv 1. Number of variables: 12 2. Number of rows: 1409286 3. Variable List: * qseqid: query or source (gene) sequence id * sseqid: subject or target (reference genome) sequence id * pident: percentage of identical positions * length: alignment length (sequence overlap) * mismatch: number of mismatches * gapopen: number of gap openings * qstart: start of alignment in query * qend: end of alignment in query * sstart: start of alignment in subject * send: end of alignment in subject * evalue: expect value * bitscore: bit score DATA-SPECIFIC INFORMATION FOR: all_cells.transcripts.edit.PR2.csv 1. Number of variables: 12 2. Number of rows: 1549087 3. Variable List: * qseqid: query or source (gene) sequence id * sseqid: subject or target (reference genome) sequence id * pident: percentage of identical positions * length: alignment length (sequence overlap) * mismatch: number of mismatches * gapopen: number of gap openings * qstart: start of alignment in query * qend: end of alignment in query * sstart: start of alignment in subject * send: end of alignment in subject * evalue: expect value * bitscore: bit score DATA-SPECIFIC INFORMATION FOR: first_cells.transcripts.edit.metaPR2.csv 1. Number of variables: 12 2. Number of rows: 56763 3. Variable List: * qseqid: query or source (gene) sequence id * sseqid: subject or target (reference genome) sequence id * pident: percentage of identical positions * length: alignment length (sequence overlap) * mismatch: number of mismatches * gapopen: number of gap openings * qstart: start of alignment in query * qend: end of alignment in query * sstart: start of alignment in subject * send: end of alignment in subject * evalue: expect value * bitscore: bit score DATA-SPECIFIC INFORMATION FOR: first_cells.transcripts.edit.PR2.csv 1. Number of variables: 12 2. Number of rows: 67864 3. Variable List: * qseqid: query or source (gene) sequence id * sseqid: subject or target (reference genome) sequence id * pident: percentage of identical positions * length: alignment length (sequence overlap) * mismatch: number of mismatches * gapopen: number of gap openings * qstart: start of alignment in query * qend: end of alignment in query * sstart: start of alignment in subject * send: end of alignment in subject * evalue: expect value * bitscore: bit score DATA-SPECIFIC INFORMATION FOR: cells.filtered.blastx.csv 1. Number of variables: 10 2. Number of rows: 2021 3. Variable List: * qseqid: query or source (gene) sequence id * sseqid: subject or target (reference genome) sequence id * pident: percentage of identical positions * bitscore: bit score * domain: domain of life the of subject * phylum: phylum the of subject * family: family the of subject * genus: genus the of subject * species: species the of subject * cell barcode: cell barcode of query UMI_tables_first_cells data_raw.pickle.gz # Data combined from all UMI tables of fastq files mapped to the NCLDV marker gene database UMI_tables_scatterplot data_raw.pickle.gz # Data combined from all UMI tables of fastq files mapped to the host-virus reference data.pickle.gz # Preprocessed ata combined from all UMI tables of fastq files mapped to the host-virus reference metadata_dimentionality_reduction_1_1.2_.pickle.gz # Metadata of preprocessed data, after dimensionality reduction. ## Access information The PR2 database can be accessed from here: [https://github.com/pr2database/pr2database](https://github.com/pr2database/pr2database) Giant viruses (phylum Nucleocytoviricota) are globally distributed in aquatic ecosystems. They play significant roles as evolutionary drivers of eukaryotic plankton and regulators of global biogeochemical cycles. However, we lack knowledge about their native hosts, hindering our understanding of their lifecycle and ecological importance. Here, we used single-cell RNAseq and samples from an induced E. huxleyi bloom during a mesocosm experiment to link giant viruses with their protist hosts. We observe active giant virus infections in multiple host lineages, including members of the algal groups Chrysophycae and Prymnesiophycae, as well as heterotrophic flagellates in the class Katablepharidaceae. Katablepharids were infected with a rare Imitevirales-07 giant virus lineage expressing cell fate regulation genes. Analysis of the temporal dynamics of this host-virus interaction indicated a role for the Imitevirales-07 in the collapse of the host Katablepharid population. Our results demonstrate that single-cell RNA-seq can be used to identify previously undescribed host-virus interactions and study their ecological relevance.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5061/dryad.s7h44j1c9&type=result"></script>');
-->
</script>
citations | 1 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5061/dryad.s7h44j1c9&type=result"></script>');
-->
</script>
The mere presence of predators causes prey organisms to display predation-avoidance strategies. Predator presence is often communicated through predator-released chemical signals. Ovipositing female mosquitoes of several species are repelled by unknown signals released from larvivorous fish. It was previously suggested that in many cases, a predator’s microbiota plays an important role in the release of these signals; however, this mechanism is still poorly understood. In this study, we looked into the effects of the microbiota originating from the larvivorous Gambusia affinis (Baird and Girard) on the oviposition behavior of gravid female mosquitoes. We used fish with altered microbiota and bacterial isolates in a set of outdoor mesocosm experiments to address this aim. We show that interference with the fish microbiota significantly reduces fish’s repellant effect. We further show that the bacterium Pantoea pleuroti, isolated from the skin of the fish, repels oviposition of Culex laticinctus (Edwards) and Culiseta longiareolata (Macquart) mosquitoes similarly to the way in which live fish repel them. Our results highlight the importance of bacteria in the interspecies interactions of their hosts. Furthermore, this finding may lead to the development of an ecologically friendly mosquito repellent, that may reduce the use of larvivorous fish for mosquito control. # Fish microbiota repel ovipositing mosquitoes [https://doi.org/10.5061/dryad.9p8cz8wqf](https://doi.org/10.5061/dryad.9p8cz8wqf) The dataset includes data from 4 field experiments described in Figures 2-5. It includes the distribution of mosquito egg rafts from two species, *Culex laticinctus* and *Culiseta logiareolata*. Egg rafts were collected from pools as described in the method section. ## Description of the data and file structure Data describes the total number of egg rafts collected over all dates in each of the pools. Each pool is a combination of “block” and “treatment” variables. The other columns present the dependent variables, i.e., total number of egg rafts for each mosquito species. Experimental duration is presented at the bottom and consists of beginning and ending dates plus the total day count.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5061/dryad.9p8cz8wqf&type=result"></script>');
-->
</script>
citations | 0 | |
popularity | Average | |
influence | Average | |
impulse | Average |
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.5061/dryad.9p8cz8wqf&type=result"></script>');
-->
</script>
# Rapid and chemically diverse C transfer from trees to mycorrhizal fruit bodies in the forest [https://doi.org/10.5061/dryad.34tmpg4s5](https://doi.org/10.5061/dryad.34tmpg4s5) ## Description of the data and file structure Ectomycorrhizal fungi (EMF) are common belowground tree symbionts, supplying trees with water and nutrients. In return, large amounts of C assimilated by trees can be allocated into EMF. However, the chemical forms in which the C is transferred from trees to fungi under field conditions are mostly unknown. In this study, we aimed to unravel the fate of tree-derived C in EMF. We conducted 13CO2 pulse labeling of *Pinus halepensis* trees in two forest sites with adjacent EMF sporocarps, combined with a non-targeted metabolomics profiling of root and sporocarp tissues. 13C was measured in sporocarps of *Tricholoma terreum* and *Suillus collinitus* up to 3 m from pine stems. Here we provide a table with soil properties (pH, salinity, and mineral composition) under the study trees at the two forest sites, and three tables showing P-values of isotopes of labeled semi-polar metabolites identified in *Pinus* roots, *Suillus* fruit bodies, and *Tricholoma* fruit bodies samples comparing before and after 13C labeling. Below is a description for each of the data tables. #### **Hosted via Dryad** **Rapaport_et_al_Soil_properties.xlsx** Soil properties (pH, salinity, and mineral composition) under the study trees at the two forest sites. Measurements were done on soil at 0-10 cm and 10-20 cm depths (excluding trees 24 and 29 in Charuvit forest, where soil was too shallow). EC, electric conductivity (dS m-1); SOC, soil organic carbon (%); All mineral values (Cl, Ca, Mg, N-NO3, N-NH4, Olsen P) are at mg kg-1. #### **Hosted via Zenodo** **Table S2** Soil properties (pH, salinity, and mineral composition) under the study trees at the two forest sites. Measurements were done on soil at 0-10 cm and 10-20 cm depths (excluding trees 24 and 29 in Charuvit forest, where soil was too shallow). EC, electric conductivity (dS m-1); SOC, soil organic carbon (%); mineral values are at mg kg-1. **Table S6** P-values of isotopes of labeled semi-polar metabolites of roots samples comparing before and after 13C labeling. The isotope shown is the most significant one from each metabolite. FDR correction was performed. TCA, tricarboxylic acid cycle cycle. **Table S7** P-values of isotopes of labeled semi-polar metabolites of samples from *Suillus* comparing before and after 13C labeling. The isotope shown is the most significant one from each metabolite. FDR correction was performed. TCA, tricarboxylic acid cycle cycle. **Table S8** P-values of isotopes of labeled semi-polar metabolites in *Tricholoma* samples comparing before and after 13C labeling. The isotope shown is the most significant one from each metabolite. FDR correction was performed. TCA, tricarboxylic acid cycle cycle. ## Sharing/Access information The data belong to an accepted paper in Functional Ecology. A DOI link will be added. Ectomycorrhizal fungi (EMF) are common belowground tree symbionts, supplying trees with water and nutrients. In return, large amounts of C assimilated by trees can be allocated into EMF. However, the chemical forms in which the C is transferred from trees to fungi under field conditions are mostly unknown. In this study, we aimed to unravel the fate of tree-derived C in EMF. We conducted 13CO2 pulse labeling of Pinus halepensis trees in two forest sites with adjacent EMF sporocarps, combined with a non-targeted metabolomics profiling of root and sporocarp tissues. 13C was measured in sporocarps of Tricholoma terreum and Suillus collinitus up to 3 m from pine stems. C was assimilated in the labeled trees’ needles and transferred to their roots. Starting from day 2 after labeling, the C was transferred to adjacent sporocarps, peaking on day 5. We identified more than 100 different labeled metabolites of different chemical groups present in roots and sporocarps. Of them, 17 were common to pine roots and both EMF species, and an additional 8 were common to roots and one of the two EMFs. The major labeled metabolites in the root tips were amino acids and tricarboxylic acid intermediates. The major labeled metabolites in sporocarps were amino acids, nucleotides, and fatty acids. We also identified labeled carbohydrates in all tissues. Labeling patterns diverged across different tissues, which can hint at how the C was transferred. Considering the young tree as a sole C source for these sporocarps, and with a diurnal assimilation of 5.4 g C, the total monthly C source is ~165 g C. On average, there were 10 sporocarps around each tree, each requiring ~1 g C. Therefore, a 10 g C investment would make 6% of total tree C allocation, and about 12% of net primary productivity. Overall, we found that this significant and ubiquitous transfer of metabolites from tree roots to EMF sporocarps is more rapid and chemically diverse than once thought.