Platynereis dumerilii full-length transcriptome of developmental stages

To generate a high-quality full-length transcriptome for the annelid Platynereis dumerilii, we collected samples from representative developmental stages, from unfertilized eggs to 5 days post-fertilization. Each sample consisted of a bulk mix from 1 to 5 batches of embryos fertilized by different parents. We incubated all batches at 18 degrees Celsius until the desired time point, then collected the embryos into a clean tube and snap-froze them in liquid nitrogen with as little seawater as possible. The samples were stored at -80 degrees Celsius until RNA extraction. We extracted total RNA from the samples using a Trizol protocol. After measuring the RNA concentration with NanoDrop, we created a bulk RNA mix by combining 1 µL from each sample into a new tube. We gave the sample to the Sequencing and Genotyping facility of the Max Planck Institute of Molecular Cell Biology and Genetics, who ran aliquots of this bulk mix through a Bioanalyzer and gel electrophoresis. They found no evidence of RNA degradation. From this sample, they prepared PacBio Iso-Seq libraries using the Express Template Prep Kit 2.0 and sequenced full-length transcripts on a SMRT 8M Cell for 30 hours using a PacBio Sequel II System. They processed the raw movie subreads with SMRT Analysis software, following the Iso-Seq v3 workflow to generate representative circular consensus sequences, demultiplex and remove primers, trim poly(A) tails, and remove concatemers. After transcript clustering and merging, the resulting dataset contained 176,122 polished high-quality isoforms. Using Cogent, we removed redundant isoforms and obtained a dataset with 117,524 transcripts. From this, we generated a dataset containing only the longest isoform for each gene, with 70,003 sequences in total. We calculated descriptive metrics using Transrate. To estimate their completeness, we used BUSCO for metazoa and obtained a score of 85%. Finally, we annotated the longest-isoform dataset using EnTAP. About 85% of the transcripts have a coding sequence. We obtained annotations for 67% of the sequences, while 33% have remained unannotated. Datasets file name file size (zipped) sequences description 0-Pdum_workflow.zip (folder) 3.40 GB - entire pipeline with notebook entries and analyses 1-Pdum_hq_isoforms.zip (fasta) 180.30 MB 176,122 polished high-quality isoforms from CCS 2-Pdum_co_isoforms.zip (fasta) 70.68 MB 117,524 non-redundant polished high-quality isoforms 3-Pdum_co_longest.zip (fasta) 54.85 MB 70,003 longest of non-redundant polished high-quality isoforms 4-Pdum_co_longest_annotations.zip (tsv) 34.37 MB 70,003 (46,635 annotated) annotations for longest-isoform dataset

Related Organizations

Max Planck Institute of Molecular Cell Biology and Genetics
Germany
Max Planck Society
Germany

Keywords

PacBio, Iso-Seq, RNA-Seq, Transcriptome, Platynereis

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	2
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

2

Top 10%

Average

Funded by

DFG| unidentified