Thousands of primer-free, high-quality, full-length SSU rRNA sequences from all domains of life

Soeren M Karst; Morten S Dueholm; Simon J McIlroy; Rasmus H Kirkegaard; Per H Nielsen; Mads Albertsen

doi:10.1101/070771

Thousands of primer-free, high-quality, full-length SSU rRNA sequences from all domains of life

Soeren M Karst, Morten S Dueholm, Simon J McIlroy, Rasmus H Kirkegaard, Per H Nielsen, Mads Albertsen

Publikation: Working paper/Preprint › Working paper › Forskning

Abstract

Ribosomal RNA (rRNA) genes are the consensus marker for determination of microbial diversity on the planet, invaluable in studies of evolution and, for the past decade, high-throughput sequencing of variable regions of ribosomal RNA genes has become the backbone of most microbial ecology studies. However, the underlying reference databases of full-length rRNA gene sequences are underpopulated, ecosystem skewed, and subject to primer bias, which hamper our ability to study the true diversity of ecosystems. Here we present an approach that combines reverse transcription of full-length small subunit (SSU) rRNA genes and synthetic long read sequencing by molecular tagging, to generate primer-free, full-length SSU rRNA gene sequences from all domains of life, with a median raw error rate of 0.17%. We generated thousands of full-length SSU rRNA sequences from five well-studied ecosystems (soil, human gut, fresh water, anaerobic digestion, and activated sludge) and obtained sequences covering all domains of life and the majority of all described phyla. Interestingly, 30% of all bacterial operational taxonomic units were novel, compared to the SILVA database (less than 97% similarity). For the Eukaryotes, the novelty was even larger with 63% of all OTUs representing novel taxa. In addition, 15% of the 18S rRNA OTUs were highly novel sequences with less than 80% similarity to the databases. The generation of primer-free full-length SSU rRNA sequences enabled eco-system specific estimation of primer-bias and, especially for eukaryotes, showed a dramatic discrepancy between the in-silico evaluation and primer-free data generated in this study. The large amount of novel sequences obtained here reaffirms that there is still vast, untapped microbial diversity lacking representatives in the SSU rRNA databases and that there might be more than millions after all. With our new approach, it is possible to readily expand the rRNA databases by orders of magnitude within a short timeframe. This will, for the first time, enable a broad census of the tree of life.

Originalsprog	Engelsk
DOI	https://doi.org/10.1101/070771
Status	Udgivet - 22 aug. 2016

Adgang til dokumentet

10.1101/070771

http://biorxiv.org/content/early/2016/08/22/070771.abstract

AUB Link

Søg efter materialet i Aalborg Universitetsbiblioteks søgemaskine

Citationsformater

@techreport{c2ce41dc74564af0a6d57f845d8ddc89,

title = "Thousands of primer-free, high-quality, full-length SSU rRNA sequences from all domains of life",

abstract = "Ribosomal RNA (rRNA) genes are the consensus marker for determination of microbial diversity on the planet, invaluable in studies of evolution and, for the past decade, high-throughput sequencing of variable regions of ribosomal RNA genes has become the backbone of most microbial ecology studies. However, the underlying reference databases of full-length rRNA gene sequences are underpopulated, ecosystem skewed, and subject to primer bias, which hamper our ability to study the true diversity of ecosystems. Here we present an approach that combines reverse transcription of full-length small subunit (SSU) rRNA genes and synthetic long read sequencing by molecular tagging, to generate primer-free, full-length SSU rRNA gene sequences from all domains of life, with a median raw error rate of 0.17%. We generated thousands of full-length SSU rRNA sequences from five well-studied ecosystems (soil, human gut, fresh water, anaerobic digestion, and activated sludge) and obtained sequences covering all domains of life and the majority of all described phyla. Interestingly, 30% of all bacterial operational taxonomic units were novel, compared to the SILVA database (less than 97% similarity). For the Eukaryotes, the novelty was even larger with 63% of all OTUs representing novel taxa. In addition, 15% of the 18S rRNA OTUs were highly novel sequences with less than 80% similarity to the databases. The generation of primer-free full-length SSU rRNA sequences enabled eco-system specific estimation of primer-bias and, especially for eukaryotes, showed a dramatic discrepancy between the in-silico evaluation and primer-free data generated in this study. The large amount of novel sequences obtained here reaffirms that there is still vast, untapped microbial diversity lacking representatives in the SSU rRNA databases and that there might be more than millions after all. With our new approach, it is possible to readily expand the rRNA databases by orders of magnitude within a short timeframe. This will, for the first time, enable a broad census of the tree of life.",

author = "Karst, {Soeren M} and Dueholm, {Morten S} and McIlroy, {Simon J} and Kirkegaard, {Rasmus H} and Nielsen, {Per H} and Mads Albertsen",

year = "2016",

month = aug,

day = "22",

doi = "10.1101/070771",

language = "English",

type = "WorkingPaper",

}

TY - UNPB

T1 - Thousands of primer-free, high-quality, full-length SSU rRNA sequences from all domains of life

AU - Karst, Soeren M

AU - Dueholm, Morten S

AU - McIlroy, Simon J

AU - Kirkegaard, Rasmus H

AU - Nielsen, Per H

AU - Albertsen, Mads

PY - 2016/8/22

Y1 - 2016/8/22

N2 - Ribosomal RNA (rRNA) genes are the consensus marker for determination of microbial diversity on the planet, invaluable in studies of evolution and, for the past decade, high-throughput sequencing of variable regions of ribosomal RNA genes has become the backbone of most microbial ecology studies. However, the underlying reference databases of full-length rRNA gene sequences are underpopulated, ecosystem skewed, and subject to primer bias, which hamper our ability to study the true diversity of ecosystems. Here we present an approach that combines reverse transcription of full-length small subunit (SSU) rRNA genes and synthetic long read sequencing by molecular tagging, to generate primer-free, full-length SSU rRNA gene sequences from all domains of life, with a median raw error rate of 0.17%. We generated thousands of full-length SSU rRNA sequences from five well-studied ecosystems (soil, human gut, fresh water, anaerobic digestion, and activated sludge) and obtained sequences covering all domains of life and the majority of all described phyla. Interestingly, 30% of all bacterial operational taxonomic units were novel, compared to the SILVA database (less than 97% similarity). For the Eukaryotes, the novelty was even larger with 63% of all OTUs representing novel taxa. In addition, 15% of the 18S rRNA OTUs were highly novel sequences with less than 80% similarity to the databases. The generation of primer-free full-length SSU rRNA sequences enabled eco-system specific estimation of primer-bias and, especially for eukaryotes, showed a dramatic discrepancy between the in-silico evaluation and primer-free data generated in this study. The large amount of novel sequences obtained here reaffirms that there is still vast, untapped microbial diversity lacking representatives in the SSU rRNA databases and that there might be more than millions after all. With our new approach, it is possible to readily expand the rRNA databases by orders of magnitude within a short timeframe. This will, for the first time, enable a broad census of the tree of life.

AB - Ribosomal RNA (rRNA) genes are the consensus marker for determination of microbial diversity on the planet, invaluable in studies of evolution and, for the past decade, high-throughput sequencing of variable regions of ribosomal RNA genes has become the backbone of most microbial ecology studies. However, the underlying reference databases of full-length rRNA gene sequences are underpopulated, ecosystem skewed, and subject to primer bias, which hamper our ability to study the true diversity of ecosystems. Here we present an approach that combines reverse transcription of full-length small subunit (SSU) rRNA genes and synthetic long read sequencing by molecular tagging, to generate primer-free, full-length SSU rRNA gene sequences from all domains of life, with a median raw error rate of 0.17%. We generated thousands of full-length SSU rRNA sequences from five well-studied ecosystems (soil, human gut, fresh water, anaerobic digestion, and activated sludge) and obtained sequences covering all domains of life and the majority of all described phyla. Interestingly, 30% of all bacterial operational taxonomic units were novel, compared to the SILVA database (less than 97% similarity). For the Eukaryotes, the novelty was even larger with 63% of all OTUs representing novel taxa. In addition, 15% of the 18S rRNA OTUs were highly novel sequences with less than 80% similarity to the databases. The generation of primer-free full-length SSU rRNA sequences enabled eco-system specific estimation of primer-bias and, especially for eukaryotes, showed a dramatic discrepancy between the in-silico evaluation and primer-free data generated in this study. The large amount of novel sequences obtained here reaffirms that there is still vast, untapped microbial diversity lacking representatives in the SSU rRNA databases and that there might be more than millions after all. With our new approach, it is possible to readily expand the rRNA databases by orders of magnitude within a short timeframe. This will, for the first time, enable a broad census of the tree of life.

U2 - 10.1101/070771

DO - 10.1101/070771

M3 - Working paper

BT - Thousands of primer-free, high-quality, full-length SSU rRNA sequences from all domains of life

ER -