TY - JOUR
T1 - The identification and characterization of novel transcripts from RNA-seq data
AU - Weirick, Tyler
AU - Militello, Giuseppe
AU - Müller, Raphael
AU - John, David
AU - Dimmeler, Stefanie
AU - Uchida, Shizuka
PY - 2016/7/1
Y1 - 2016/7/1
N2 - Owing greatly to the advancement of next-generation sequencing (NGS), the amount of NGS data is increasing rapidly. Although there are many NGS applications, one of the most commonly used techniques 'RNA sequencing (RNA-seq)' is rapidly replacing microarray-based techniques in laboratories around the world. As more and more of such techniques are standardized, allowing technicians to perform these experiments with minimal hands-on time and reduced experimental/operator-dependent biases, the bottleneck of such techniques is clearly visible; that is, data analysis. Further complicating the matter, increasing evidence suggests most of the genome is transcribed into RNA; however, the majority of these RNAs are not translated into proteins. These RNAs that do not become proteins are called 'noncoding RNAs (ncRNAs)'. Although some time has passed since the discovery of ncRNAs, their annotations remain poor, making analysis of RNA-seq data challenging. Here, we examine the current limitations of RNA-seq analysis using case studies focused on the detection of novel transcripts and examination of their characteristics. Finally, we validate the presence of novel transcripts using biological experiments, showing novel transcripts can be accurately identified when a series of filters is applied. In conclusion, novel transcripts that are identified from RNA-seq must be examined carefully before proceeding to biological experiments.
AB - Owing greatly to the advancement of next-generation sequencing (NGS), the amount of NGS data is increasing rapidly. Although there are many NGS applications, one of the most commonly used techniques 'RNA sequencing (RNA-seq)' is rapidly replacing microarray-based techniques in laboratories around the world. As more and more of such techniques are standardized, allowing technicians to perform these experiments with minimal hands-on time and reduced experimental/operator-dependent biases, the bottleneck of such techniques is clearly visible; that is, data analysis. Further complicating the matter, increasing evidence suggests most of the genome is transcribed into RNA; however, the majority of these RNAs are not translated into proteins. These RNAs that do not become proteins are called 'noncoding RNAs (ncRNAs)'. Although some time has passed since the discovery of ncRNAs, their annotations remain poor, making analysis of RNA-seq data challenging. Here, we examine the current limitations of RNA-seq analysis using case studies focused on the detection of novel transcripts and examination of their characteristics. Finally, we validate the presence of novel transcripts using biological experiments, showing novel transcripts can be accurately identified when a series of filters is applied. In conclusion, novel transcripts that are identified from RNA-seq must be examined carefully before proceeding to biological experiments.
KW - Gene expression
KW - lncRNA
KW - Novel transcripts
KW - RNA-seq
KW - Transcriptome assembly
UR - http://www.scopus.com/inward/record.url?scp=84991401479&partnerID=8YFLogxK
U2 - 10.1093/bib/bbv067
DO - 10.1093/bib/bbv067
M3 - Journal article
C2 - 26283677
AN - SCOPUS:84991401479
SN - 1467-5463
VL - 17
SP - 678
EP - 685
JO - Briefings in Bioinformatics
JF - Briefings in Bioinformatics
IS - 4
ER -