Exon location

Ask questions about annotation of D. erecta, D. mojavensis, and D. grimshawi projects here.
Post Reply
drevie
Posts: 67
Joined: Sun Feb 04, 2007 10:23 pm
Location: California Lutheran University, Thousand Oaks, CA

Exon location

Post by drevie » Wed Apr 09, 2014 6:56 pm

One gene that students are annotating either has a non-canonical splice site or it is missing one nucleotide near the end of an exon or the beginning of another exon. Between the third and fourth exons the reading frame is off if they use the predicted splice sites. I know there are some errors in the D. biarmipes assembly, so is it possible that it is due to a sequencing error? If so, how can we check the assembly?

wleung
Posts: 185
Joined: Sun Feb 04, 2007 7:41 pm
Location: Washington University in St. Louis

Re: Exon location

Post by wleung » Thu Apr 10, 2014 12:40 am

Both proposed hypothesis are possible but I could not ascertain which hypothesis is more likely without examining the region in the genome browser and gather more evidence.

In general, non-canonical GC donor sites are found in ~1% of Drosophila genes. Consequently, we will annotate a GC donor site if the GC donor is also found in D. melanogaster or if the splice junction is supported by RNA-Seq data (i.e. TopHat junctions). In order to justify a non-canonical splice donor site that is not present in D. melanogaster, it should be supported by RNA-Seq data and is found in another Drosophila species that is more closely related to D. biarmipes. In some cases, you can use the "Conservation" track (set display mode to full) to see if the non-canonical splice donor site is conserved in the other Drosophila species.

Most of the errors in the D. biarmipes consensus sequence are found within mononucleotide runs because of the known weaknesses of the 454 sequencers. If the region is covered by RNA-Seq reads, you can compare the aligned RNA-Seq reads (available through the D. biarmipes RNA-Seq track) with the consensus to identify potential consensus error. The Sequence Updater User Guide contains an example that illustrates how you can use the RNA-Seq reads to identify and document a consensus error.

Alternatively, please let us know the project and the region you are working on and we can examine the assembly in Consed to verify the consensus sequence.

drevie
Posts: 67
Joined: Sun Feb 04, 2007 10:23 pm
Location: California Lutheran University, Thousand Oaks, CA

Re: Exon location

Post by drevie » Thu Apr 10, 2014 10:27 pm

The D. biarmipes contig is 56. The 3rd exon on the contig of the Toy gene is likely from 12089-12330. Near the end of it are 6 A's. The next exon is probably 17545-17721. However, this causes a reading frame problem. The students found a solution by using an alternative splice site, but the solution doesn't look likely to me.

wleung
Posts: 185
Joined: Sun Feb 04, 2007 7:41 pm
Location: Washington University in St. Louis

Re: Exon location

Post by wleung » Thu Apr 10, 2014 11:22 pm

The discrepancy is likely caused by a non-canonical GC splice donor site at the end of CDS 3_1762_0

According to the Gene Record Finder, both the CDS 3_1762_0 and 5_1762_2 have a non-canonical GC donor site in D. melanogaster. Consequently, if this CDS is conserved in D. biarmipes, we would look for a GC donor site at the end of CDS 3_1762_0 rather than a GT donor site.
toy_GC_donor_Gene_Record.png
toy_GC_donor_Gene_Record.png (38.03 KiB) Viewed 6450 times
As you have mentioned, blastx alignment of CDS 3_1762_0 against contig56 shows that the CDS is in frame +2 and the alignment spans from 12089-12325. The blastx alignment of CDS 4_1762_2 against contig56 is in frame +3 and spans from 17547-17720. Both alignments cover the entire length of the CDS in D. melanogaster so we would expect to find the splice sites near the ends of the alignments.
blastx_toy_CDS.png
blastx_toy_CDS.png (79 KiB) Viewed 6450 times
Examination of the region near the beginning of the blastx alignment to CDS 4_1762_2 shows that the best supported splice acceptor site is at 17543-17544 and the acceptor is in phase 2. This means that the donor site for CDS 3_1762_0 must be in phase 1.

Examination of the region near the end of the blastx alignment to CDS 3_1762_0 shows a phase 1 GC donor site (relative to frame +2) that is supported by RNA-Seq coverage and Cufflinks transcripts. As you have mentioned, the only GT (12331-12332) before the first in-frame stop codon in frame +2 is in phase 2 and it is incompatible with the phase 2 acceptor we have identified previously. To further support the hypothesis of the non-canonical splice donor site, you can examine the "Conservation" track to determine if this non-canonical splice donor site is conserved in other Drosophila species. The Conservation track shows that the GC donor site is conserved in at least 7 different Drosophila species.
toy_GC_donor_site_evidence.png
toy_GC_donor_site_evidence.png (61.43 KiB) Viewed 6450 times
Collectively, the evidence indicate that CDS 3_1762_0 has a non-canonical donor site GC at 12327-12328.

drevie
Posts: 67
Joined: Sun Feb 04, 2007 10:23 pm
Location: California Lutheran University, Thousand Oaks, CA

Re: Exon location

Post by drevie » Sat Apr 12, 2014 12:58 pm

Thanks for the help-I didn't realize there was a track that listed the splice sites for several other Drosophila (conservation).

Post Reply