Tag based sequencing – transcriptome analysis beyond microarrays

Projektdetaljer

Beskrivelse

All living things carry their genetic information in genes usually in the form of DNA. The activity of these genes are regulated to meet the requirement by the organism itself or as a response to external abiotic factors such as light, heat and temperature, but also to biotic factors such as infection by pathogens. Genes are transcribed into mRNAs, which in turn is translated into proteins and catalytically active enzymes. Regulation of this system is primarily obtained by controlling the amount of mRNA that is produced from each gene and the turnover of the corresponding protein. The mRNA population is often referred to as the transcriptome. The complexity of the system is enormous; we believe ca. 25,000 genes are present in the genome of animals and higher plant. In order to understand the genetics that underlie biological change such as development, disease, crop yield or resistance it has proven very informative to perform comparative transcriptomics to understand how the genes are regulated in response to these changes.

Several methods for gene expression profiling exist, such as Northern Blotting, Differential Display, EST sequencing, Massive Parallel Signature Sequencing (MPSS), DNA microarrays and Serial Analysis of Gene Expression (SAGE). The choice of method depends on the need for sensitivity, specificity and whether the methods allow monitoring of genes previously characterized, and most importantly, cost. The dominant method for global gene expression profiling today is DNA micro arrays (Lockhart et al. 1996). An array may consist of up to 100,000 unique single stranded DNA molecules attached to a glass slide in an ordered fashion. Two samples of mRNA are prepared and labeled with two different fluorescence labels, mixed and hybridized to the array. At positions, where the amount of mRNA is different between the two samples one of the two fluorescence signals is in excess. This is quantified and because the DNA sequence at a particular position is known to be unique to one mRNA, it provides a measure of the relative amount of mRNA present between the two samples. An advantage of DNA microarrays is that once the array has been made at a very high cost, many measurements can be made at a relatively low cost. However, due to the “analog” nature of the signal there is a limited dynamical range of measurement, and importantly, only known genes can be spotted on the array, so it requires a detailed knowledge of the genetic background and quantitative hybridization experiments are very difficult to carry out reproducibly in practice.
SAGE on the other hand, is a digital method that can measure the expression of both known and unknown genes. It relies on the extraction of a unique 20 bp sequence (tag) from each mRNA. These tags are usually ligated together end to end and sequenced (Velculescu et al. 1995). By traditional high throughput DNA sequencing equipment a typical sequence run of 96 samples ca. 1600 tags and therefore mRNAs can be detected. Typically, determining 50.000 tags can provide detailed knowledge of the 2000 most highly expressed genes in the tissue analyzed and not limited to previously known genes. Therefore SAGE can be used to discover new genes. Unknown tags obtained through SAGE analysis of a sample can be efficiently used a gene specific primers in Rapid Amplification of cDNA Ends (RACE) reactions to generate full length transcripts which can be cloned and sequenced (Nielsen et al. 2005).
Furthermore, because SAGE is a “digital” method the sensitivity is limited only by amount of tags sampled and can therefore be extended beyond that of micro arrays. This is an important feature in the exploration of transcriptomes, because it facilitates the reliable quantification of the “master gene regulators”, transcription factors. While SAGE for these reasons constitutes an attractive alternative to microarrays, it has two major drawbacks: it is slow compared to microarrays and it is expensive due to the high cost of sequencing, and the manual labor or robotics needed for colony picking, library construction and sample preparation.
In 2005, 454 Life Science Corp., described an integrated system linking an emulsion-based method to isolate and amplify fragments in vitro, and an instrument that performs highly parallel pyrophosphate based sequencing (pyrosequencing) in picolitre sized wells. In a typical run of 4 hours more than 25 million bases are obtained (Margulies et al. 2005). Utilizing the ability of 454 sequencing to produce app. 225,000 sequences of app. 110 nt in a single run, we have developed a novel transcriptome analysis method called DeepSAGE (Nielsen et al. 2006), which is capable of producing app. 340.000 SAGE tags per run. Furthermore, in the experimental design we have included nucleotide identification keys, that allow multiplexing of up to 64 samples (can be extending indefinitely). An added benefit was that since we only need to sequence ca. 60 nucleotides to analyze a SAGE-ditag, 454-sequencing was in fact more accurate than Sanger Sequencing. These technology developments have decreased the cost of SAGE analysis and the sample preparation time by an order of magnitude.
Very recently (Nov 2006), Solexa, Inc. has marketed their 1G genetic analyzer. Based on a similar type of sequencing as 454, the sequence clusters are much smaller and consequently, many more sequences (up to 40 mio) can be obtained simultaneously. However, much shorter reads are obtained (25-35 nt). The readlength is insufficient for SAGE-ditag analysis, but we are currently initiated a collaboration with Solexa about adapting the DeepSAGE technology using the Solexa sequencing platform to analyze SAGE-monotags. In fact, we now believe that the experimental sample preparation can be much simpler with this platform, and essentially involve only two experimental steps (KLN pending patent investigation). The throughput of this technology is astonishing. The sensitivity of a micro array experiment is equivalent to a SAGE analysis of 120,000 tags (Lu et al. 2004). Therefore, 40 mio. tags are equivalent to 333 microarray experiments! The running cost of sequencing is 2500$, or 7.5$ per “microarray sensitivity”. In comparison, the raw cost of analysis for a microarray experiment is roughly 500$.

The DeepSAGE technology is generic, and will work with equal efficiency in all eukaryotes for profiling of both mRNA and miRNA. With some further development we believe it can be used for the profiling of RNA from prokaryotes as well. At AAU we have expertise in tag-based transcriptomics and the development of the custom bioinformatic tools necessary to analyze the data produced. Acquisition of a Solexa 1G genetic analyzer will enable us to conduct much more comprehensive, accurate and statistically sound gene expression profiling at a fraction of the cost (1/50) currently used for gene expression profiling by micro arrays. The throughput of the apparatus is so high that we will be able to run it 3 times a week for 40 weeks annually (equivalent to ca. 40.000 microarray experiments), providing we have enough relevant samples. It is our ambition, upon acquisition of the sequencer, to function as a “national center for tag-based transcriptome analysis” and collaborate with all disciplines across the Life Sciences.
StatusAfsluttet
Effektiv start/slut dato01/01/200831/12/2008

Finansiering

  • Forskningsrådet for Teknologi og Produktion: 3.480.852,00 kr.