We've updated our Privacy Policy to make it clearer how we use your personal data.

We use cookies to provide you with a better experience. You can read our Cookie Policy here.

Advertisement

RNA-Seq: Basics, Applications and Protocol

A printer outputting RNA bases from an RNA sequence.
Credit: Technology Networks
Listen with
Speechify
0:00
Register for free to listen to this article
Thank you. Listen to this article using the player above.

Want to listen to this article for FREE?

Complete the form below to unlock access to ALL audio articles.

Read time: 8 minutes

What is RNA-seq?


RNA-seq (RNA-sequencing) is a technique that can examine the quantity and sequences of RNA in a sample using next-generation sequencing (NGS). It analyzes the transcriptome, indicating which of the genes encoded in our DNA are turned on or off and to what extent. Here, we look at why RNA-seq is useful, how the technique works and the basic protocol that is commonly used today.1


What are the applications of RNA-seq?


RNA-seq lets us investigate and discover the transcriptome, the total cellular content of RNAs including mRNA, rRNA and tRNA. Understanding the transcriptome is key if we are to connect the information in our genome with its functional protein expression. RNA-seq can tell us which genes are turned on in a cell, what their level of transcription is, and at what times they are activated or shut off.2 This allows scientists to understand the biology of a cell more deeply and assess changes that may indicate disease. Some of the most popular techniques that use RNA-seq are transcriptional profiling, single nucleotide polymorphism (SNP) identification,3 RNA editing and differential gene expression analysis.4


This can give researchers vital information about the function of genes. For example, the transcriptome can highlight all the tissues in which a gene of unknown function is turned on, which might indicate what its role is. It also captures information about alternative splicing events (Figure 1), which produce different transcripts from one single gene sequence. These events would not be picked up by DNA sequencing. It can also identify post-transcriptional modifications that occur during mRNA processing such as polyadenylation and 5’ capping.2

An image explains how RNA short reads are split by intron when aligning to a reference genome.

Figure 1: RNA-seq data uses short reads of mRNA which is free of intronic non-coding DNA. These reads must then be aligned back to the reference genome. Credit: Technology Networks.


How does RNA-seq work?


Early RNA-seq techniques used Sanger sequencing technology, a technique that although innovative at the time was also low-throughput and costly. It is only recently, with the advent and proliferation of NGS technology, have we been able to fully take advantage of RNA-seq’s potential.5


An RNA-seq workflow has several steps, which can be broadly summarized as:

  1. RNA extraction
  2. Reverse transcription into cDNA
  3. Adapted ligation
  4. Amplification
  5. Sequencing

Once you have obtained your RNA sample for analysis, the first step in the technique involves converting the population of RNA to be sequenced into complimentary DNA (cDNA) fragments (a cDNA library). This is done by reverse transcription and allows the RNA to be put into an NGS workflow. The cDNA is then fragmented, and adapters are added to each end of the fragments. These adapters contain functional elements which permit sequencing, for example, the amplification element (which facilitates clonal amplification of the fragments) and the primary sequencing priming site. Following processes of amplification, size selection, clean-up and quality checking, the cDNA library is then analyzed by NGS, producing short sequences that correspond to all or part of the fragment from which it was derived. The depth to which the library is sequenced varies depending on the purpose for which the output data will be used for. Sequencing may follow either single-end or paired-end sequencing methods. Single-read sequencing is a cheaper and faster technique (for reference, about 1% of the cost of Sanger sequencing) that sequences the cDNA fragments from just one end. Paired-end methods sequence from both ends and are therefore more expensive6,7 but offer advantages in post-sequencing data reconstruction.



A further choice must be made between strand-specific and non-strand-specific protocols. The former method means the information about which DNA strand was transcribed is retained. The value of extra information obtained from strand-specific protocols make them the favorable option.


These reads, of which there will be many millions by the end of the workflow, can then be aligned to a reference genome if available or assembled de novo to produce an RNA sequence map that spans the transcriptome.8

RNA-seq vs microarrays: Why RNA-seq is considered superior 


RNA-seq is widely regarded as superior to other technologies, such as microarray hybridization. There are several reasons for RNA-seq’s well-regarded status:


Not limited to genomic sequences –
 unlike hybridization-based approaches, which may require species-specific probes, RNA-seq can detect transcripts from organisms with previously undetermined genomic sequences. This makes it fundamentally superior for the detection of novel transcripts, SNPs or other alterations.9,10


Low background signal –
 the cDNA sequences used in RNA-seq can be mapped to targeted regions on the genome, which makes it easy to remove experimental noise. Furthermore, issues with cross-hybridization or sub-standard hybridization, which can plague microarray experiments, are not an issue in RNA-seq experiments.


More quantifiable - 
Microarray data is only ever displayed as values relative to other signals detected on the array, whilst RNA-seq data is quantifiable. RNA-seq also avoids the issues microarrays have in detecting very high or very low transcription levels.


A workflow for RNA-seq

Figure 2: A workflow for RNA-seq. Credit: Technology Networks.


An RNA-seq protocol


Experiment planning 


Preparation prior to starting your RNA-seq experiment is essential. Questions to answer before starting include:11

  • What method of RNA purification are you using?
  • What read depth will you need?
  • Which platform will you use? 
  • Is there a reference genome available and which will you use?
  • How are you assessing the quality of your RNA?
  • Do you need to enrich your target RNA?
  • Will you barcode your RNA?
  • Have I got enough biological and technical replicates?
  • Single-end or paired-end sequencing?
  • What read length will you use?
  • Do I want to retain strand-specific information?


cDNA library preparation 


After these points have been considered, you can start preparing your cDNA library. This will require fragmentation of the cDNA, addition of the platform-specific “adapter sequences” and amplification of the cDNA, but the exact procedure will be very specific to the platform used at this stage. For strand-specific protocols, the amplification of the cDNA involves a reverse transcriptase-mediated first strand synthesis followed by a DNA polymerase-mediated second strand synthesis.11,12 Barcodes may also be added that enable multiplexing, so numerous samples can be sequenced in a single run. It can be beneficial to quantify your library at the end of the library preparation stage to ensure the protocol has been successful and check the quality and concentration of your library to enable optimal sequencing performance.


cDNA sequencing


Once the library is prepared, you can use your chosen sequencing platform to sequence your cDNA library to your desired depth and requirements. Once your transcript data has been produced, you can map the data to your reference genome or assemble it de novo if no reference is available. The alignment process can be complicated by the presence of splice variants and modifications, and the choice of reference genome used will also vary how difficult this stage is. Software packages such as STAR are useful at this stage, as are quality control tools like Picard or Qualimap.13 De novo assembly will allow for the discovery of novel transcripts in addition to those already known.


RNA-seq data analysis


After the alignment stage, you can focus on analyzing your data. Tools like Sailfish, RSEM and BitSeq13 will help you quantify your transcription levels, whilst tools like MISO, which quantifies alternatively spliced genes, are available for more specialized analysis.14 There is a library of these tools out there, and reading reviews and roundups are your best way to find the right tool for your research.


To sum up, modern-day RNA-seq is well established as the superior option to microarrays and will likely remain the preferred option for the time being.  

Challenges of RNA-seq


Significant progress has been made in the field of RNA-seq over the last decade or so. The associated costs have reduced significantly while throughput has increased, sequence fidelity is far superior to earlier iterations of the NGS technologies and the availability of data analysis tools and pipelines has improved tremendously. However, there remain a number of challenges for scientists to bear in mind when considering RNA-seq experiments. These include:


Isolating sufficient, high-quality RNA – while the sample quantity requirements for RNA-seq analysis have reduced drastically, it is still important to ensure you are able to obtain sufficient RNA to fulfill all your analysis requirements, including repeats if necessary. It is also important to bear in mind that, while you may isolate total RNA, depending upon your experimental question, you are likely only to be sequencing a fraction of this (typically messenger RNA (mRNA)), further reducing your sample quantity. This must also be of high quality and purity as poor samples are likely to lead to poor results, or in some cases failure within the library preparation protocol. The quality and concentration of RNA can be determined using UV-visible spectroscopy. Unlike DNA, RNA degrades rapidly so it important to treat samples with care at all stages of isolation and purification. Degradation may not be uniform, hindering the comparison of transcription levels between genes. Low-level transcripts may be lost from the sequenced population altogether.


The impact of sample pooling – pooling samples prior to library preparation (without the use of barcoding) can reduce sequencing effort and costs or enable sequencing in cases where sample quantities are very limited. However, it is important to account for this during data analysis, with one such pool considered to be one biological replicate, not however many samples went in to making up the pool. Variations between the pooled samples can lead to misleading results and statistical issues so possible implications should be considered during the experimental design process.


Trading-off sequencing depth against sample number – It may seem appealing to get as many samples done in a single sequencing run as possible to reduce costs and machine time. However, this comes at a cost. The more samples are multiplexed, the fewer reads will be obtained for each of those samples. With reducing read depth comes mounting uncertainty as to the reliability of the sequences obtained. Sequencing technologies are still far from perfect, and mistakes are made in reads. It is therefore important to find the sweet spot between obtaining sufficient read depth to give confidence in the quality and fidelity of the sequencing data obtained and maximizing sequencing capacity to ensure sufficient biological replicates can be analyzed to give meaningful data.