Page 1 of 1

Exon location

Posted: Wed Apr 09, 2014 6:56 pm
by drevie
One gene that students are annotating either has a non-canonical splice site or it is missing one nucleotide near the end of an exon or the beginning of another exon. Between the third and fourth exons the reading frame is off if they use the predicted splice sites. I know there are some errors in the D. biarmipes assembly, so is it possible that it is due to a sequencing error? If so, how can we check the assembly?

Re: Exon location

Posted: Thu Apr 10, 2014 12:40 am
by wleung
Both proposed hypothesis are possible but I could not ascertain which hypothesis is more likely without examining the region in the genome browser and gather more evidence.

In general, non-canonical GC donor sites are found in ~1% of Drosophila genes. Consequently, we will annotate a GC donor site if the GC donor is also found in D. melanogaster or if the splice junction is supported by RNA-Seq data (i.e. TopHat junctions). In order to justify a non-canonical splice donor site that is not present in D. melanogaster, it should be supported by RNA-Seq data and is found in another Drosophila species that is more closely related to D. biarmipes. In some cases, you can use the "Conservation" track (set display mode to full) to see if the non-canonical splice donor site is conserved in the other Drosophila species.

Most of the errors in the D. biarmipes consensus sequence are found within mononucleotide runs because of the known weaknesses of the 454 sequencers. If the region is covered by RNA-Seq reads, you can compare the aligned RNA-Seq reads (available through the D. biarmipes RNA-Seq track) with the consensus to identify potential consensus error. The Sequence Updater User Guide contains an example that illustrates how you can use the RNA-Seq reads to identify and document a consensus error.

Alternatively, please let us know the project and the region you are working on and we can examine the assembly in Consed to verify the consensus sequence.

Re: Exon location

Posted: Thu Apr 10, 2014 10:27 pm
by drevie
The D. biarmipes contig is 56. The 3rd exon on the contig of the Toy gene is likely from 12089-12330. Near the end of it are 6 A's. The next exon is probably 17545-17721. However, this causes a reading frame problem. The students found a solution by using an alternative splice site, but the solution doesn't look likely to me.

Re: Exon location

Posted: Thu Apr 10, 2014 11:22 pm
by wleung
The discrepancy is likely caused by a non-canonical GC splice donor site at the end of CDS 3_1762_0

According to the Gene Record Finder, both the CDS 3_1762_0 and 5_1762_2 have a non-canonical GC donor site in D. melanogaster. Consequently, if this CDS is conserved in D. biarmipes, we would look for a GC donor site at the end of CDS 3_1762_0 rather than a GT donor site.
toy_GC_donor_Gene_Record.png (38.03 KiB) Viewed 6449 times
As you have mentioned, blastx alignment of CDS 3_1762_0 against contig56 shows that the CDS is in frame +2 and the alignment spans from 12089-12325. The blastx alignment of CDS 4_1762_2 against contig56 is in frame +3 and spans from 17547-17720. Both alignments cover the entire length of the CDS in D. melanogaster so we would expect to find the splice sites near the ends of the alignments.
blastx_toy_CDS.png (79 KiB) Viewed 6449 times
Examination of the region near the beginning of the blastx alignment to CDS 4_1762_2 shows that the best supported splice acceptor site is at 17543-17544 and the acceptor is in phase 2. This means that the donor site for CDS 3_1762_0 must be in phase 1.

Examination of the region near the end of the blastx alignment to CDS 3_1762_0 shows a phase 1 GC donor site (relative to frame +2) that is supported by RNA-Seq coverage and Cufflinks transcripts. As you have mentioned, the only GT (12331-12332) before the first in-frame stop codon in frame +2 is in phase 2 and it is incompatible with the phase 2 acceptor we have identified previously. To further support the hypothesis of the non-canonical splice donor site, you can examine the "Conservation" track to determine if this non-canonical splice donor site is conserved in other Drosophila species. The Conservation track shows that the GC donor site is conserved in at least 7 different Drosophila species.
toy_GC_donor_site_evidence.png (61.43 KiB) Viewed 6449 times
Collectively, the evidence indicate that CDS 3_1762_0 has a non-canonical donor site GC at 12327-12328.

Re: Exon location

Posted: Sat Apr 12, 2014 12:58 pm
by drevie
Thanks for the help-I didn't realize there was a track that listed the splice sites for several other Drosophila (conservation).