Resolving the problem of multiple accessions of the same transcript deposited across various public databases

Tyler Weirick, David John, Shizuka Uchida*

*Corresponding author for this work

Research output: Contribution to journalJournal articleResearchpeer-review

7 Citations (Scopus)

Abstract

Maintaining the consistency of genomic annotations is an increasingly complex task because of the iterative and dynamic nature of assembly and annotation, growing numbers of biological databases and insufficient integration of annotations across databases. As information exchange among databases is poor, a 'novel' sequence from one reference annotation could be annotated in another. Furthermore, relationships to nearby or overlapping annotated transcripts are even more complicated when using different genome assemblies. To better understand these problems, we surveyed current and previous versions of genomic assemblies and annotations across a number of public databases containing long noncoding RNA. We identified numerous discrepancies of transcripts regarding their genomic locations, transcript lengths and identifiers. Further investigation showed that the positional differences between reference annotations of essentially the same transcript could lead to differences in its measured expression at the RNA level. To aid in resolving these problems, we present the algorithm 'Universal Genomic Accession Hash (UGAHash)' and created an open source web tool to encourage the usage of the UGAHash algorithm. The UGAHash web tool (http://ugahash.uni-frankfurt.de) can be accessed freely without registration. The web tool allows researchers to generate Universal Genomic Accessions for genomic features or to explore annotations deposited in the public databases of the past and present versions.We anticipate that the UGAHash web tool will be a valuable tool to check for the existence of transcripts before judging the newly discovered transcripts as novel.

Original languageEnglish
JournalBriefings in Bioinformatics
Volume18
Issue number2
Pages (from-to)226-235
Number of pages10
ISSN1467-5463
DOIs
Publication statusPublished - 2017
Externally publishedYes

Keywords

  • Accession numbers
  • Accession system
  • Annotation scheme
  • Databases
  • Hashing algorithm
  • LncRNA
  • Novel transcripts

Fingerprint

Dive into the research topics of 'Resolving the problem of multiple accessions of the same transcript deposited across various public databases'. Together they form a unique fingerprint.

Cite this